[ceph-users] CephFS client df command showing raw space after adding second pool to mds

2019-01-03 Thread David C
Hi All

Luminous 12.2.12
Single MDS
Replicated pools

A 'df' on a CephFS kernel client used to show me the usable space (i.e the
raw space with the replication overhead applied). This was when I just had
a single cephfs data pool.

After adding a second pool to the mds and using file layouts to map a
directory to that pool, a df is now showing the raw space. It's not the end
of the world but was handy to see the usable space.

I'm fairly sure the change was me adding the second pool although I'm not
99% sure.

I'm seeing this behavior on the latest Centos 7.6 kernel and a 4.14 kernel,
is this expected?

Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help with setting device-class rule on pool without causing data to move

2019-01-03 Thread David C
Thanks, Sage! That did the trick.

Wido, seems like an interesting approach but I wasn't brave enough to
attempt it!

Eric, I suppose this does the same thing that the crushtool reclassify
feature does?

Thank you both for your suggestions.

For posterity:

-  I grabbed some 14.0.1 packages, extracted crushtool
and libceph-common.so.1
- Ran 'crushtool -i cm --reclassify --reclassify-root default hdd -o
cm_reclassified'
- Compared the maps with:

crushtool -i cm --compare cm_reclassified

That suggested I would get an acceptable amount of data reshuffling which I
expected, I didn't use --set-subtree-class as I'd already added SSD drives
to the cluster.

My ultimate goal was to migrate the cephfs_metadata pool onto SSD drives
while leaving the cephfs_data pool on the HDD drives. The device classes
feature made that really trivial, I just created an intermediary rule which
would use both HDD and SDD hosts (I didn't have any mixed devices in
hosts), set the Metadata pool to use the new rule, waited for recovery and
then set the Metadata pool to use an SSD-only rule. Not sure if that
intermediary stage was strictly necessary, I was concerned about inactive
PGs.

Thanks,
David

On Mon, Dec 31, 2018 at 6:06 PM Eric Goirand  wrote:

> Hi David,
>
> CERN has provided with a python script to swap the correct bucket IDs
> (default <-> hdd), you can find it here :
>
> https://github.com/cernceph/ceph-scripts/blob/master/tools/device-class-id-swap.py
>
> The principle is the following :
> - extract the CRUSH map
> - run the script on it => it creates a new CRUSH file.
> - edit the CRUSH map and modify the rule associated with the pool(s) you
> want to associate with HDD OSDs only like :
> => step take default WITH step take default class hdd
>
> Then recompile and reinject the new CRUSH map and voilà !
>
> Your cluster should be using only the HDD OSDs without rebalancing (or a
> very small amount).
>
> In case you have forgotten something, just reapply the former CRUSH map
> and start again.
>
> Cheers and Happy new year 2019.
>
> Eric
>
>
>
> On Sun, Dec 30, 2018, 21:16 David C  wrote:
>
>> Hi All
>>
>> I'm trying to set the existing pools in a Luminous cluster to use the hdd
>> device-class but without moving data around. If I just create a new rule
>> using the hdd class and set my pools to use that new rule it will cause a
>> huge amount of data movement even though the pgs are all already on HDDs.
>>
>> There is a thread on ceph-large [1] which appears to have the solution
>> but I can't get my head around what I need to do. I'm not too clear on
>> which IDs I need to swap. Could someone give me some pointers on this
>> please?
>>
>> [1]
>> http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000109.html
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] list admin issues

2019-01-02 Thread David Galloway


On 12/28/18 4:13 AM, Ilya Dryomov wrote:
> On Sat, Dec 22, 2018 at 7:18 PM Brian :  wrote:
>>
>> Sorry to drag this one up again.
>>
>> Just got the unsubscribed due to excessive bounces thing.
>>
>> 'Your membership in the mailing list ceph-users has been disabled due
>> to excessive bounces The last bounce received from you was dated
>> 21-Dec-2018.  You will not get any more messages from this list until
>> you re-enable your membership.  You will receive 3 more reminders like
>> this before your membership in the list is deleted.'
>>
>> can anyone check MTA logs to see what the bounce is?
> 
> Me too.  Happens regularly and only on ceph-users, not on sepia or
> ceph-maintainers, etc.  David, Dan, could you or someone you know look
> into this?
> 

As far as I know, we don't have shell access to the mail servers for
those lists so I can't see what's going on behind the scenes.  I will
increase the bounce_score_threshold for now and change the list owner to
an active e-mail address (oops) who will get the bounce notifications.

The Bounce Processing settings are the same for ceph-users and
ceph-maintainers so I'm guessing the high volume of ceph-users@ is why
it's only happening on that list.

I think the plan is to move to a self-hosted mailman instance soon so
this shouldn't be an issue for much longer.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help with setting device-class rule on pool without causing data to move

2018-12-30 Thread David C
Hi All

I'm trying to set the existing pools in a Luminous cluster to use the hdd
device-class but without moving data around. If I just create a new rule
using the hdd class and set my pools to use that new rule it will cause a
huge amount of data movement even though the pgs are all already on HDDs.

There is a thread on ceph-large [1] which appears to have the solution but
I can't get my head around what I need to do. I'm not too clear on which
IDs I need to swap. Could someone give me some pointers on this please?

[1]
http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000109.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore nvme DB/WAL size

2018-12-21 Thread David C
I'm in a similar situation, currently running filestore with spinners and
journals on NVME partitions which are about 1% of the size of the OSD. If I
migrate to bluestore, I'll still only have that 1% available. Per the docs,
if my block.db device fills up, the metadata is going to spill back onto
the block device which will incur an understandable perfomance penalty. The
question is, will there be more of performance hit in that scenario versus
if the block.db was on the spinner and just the WAL was on the NVME?

On Fri, Dec 21, 2018 at 9:01 AM Janne Johansson  wrote:

> Den tors 20 dec. 2018 kl 22:45 skrev Vladimir Brik
> :
> > Hello
> > I am considering using logical volumes of an NVMe drive as DB or WAL
> > devices for OSDs on spinning disks.
> > The documentation recommends against DB devices smaller than 4% of slow
> > disk size. Our servers have 16x 10TB HDDs and a single 1.5TB NVMe, so
> > dividing it equally will result in each OSD getting ~90GB DB NVMe
> > volume, which is a lot less than 4%. Will this cause problems down the
> road?
>
> Well, apart from the reply you already got on "one nvme fails all the
> HDDs it is WAL/DB for",
> the recommendations are about getting the best out of them, especially
> for the DB I suppose.
>
> If one can size stuff up before, then following recommendations is a
> good choice, but I think
> you should test using it for WALs for instance, and bench it against
> another host with data,
> wal and db on the HDD and see if it helps a lot in your expected use case.
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] active+recovering+degraded after cluster reboot

2018-12-15 Thread David C
Yep, that cleared it. Sorry for the noise!

On Sun, Dec 16, 2018 at 12:16 AM David C  wrote:

> Hi Paul
>
> Thanks for the response. Not yet, just being a bit cautious ;) I'll go
> ahead and do that.
>
> Thanks
> David
>
>
> On Sat, 15 Dec 2018, 23:39 Paul Emmerich 
>> Did you unset norecover?
>>
>>
>> Paul
>>
>> --
>> Paul Emmerich
>>
>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>>
>> croit GmbH
>> Freseniusstr. 31h
>> 81247 München
>> www.croit.io
>> Tel: +49 89 1896585 90
>>
>> On Sun, Dec 16, 2018 at 12:22 AM David C  wrote:
>> >
>> > Hi All
>> >
>> > I have what feels like a bit of a rookie question
>> >
>> > I shutdown a Luminous 12.2.1 cluster with noout,nobackfill,norecover set
>> >
>> > Before shutting down, all PGs were active+clean
>> >
>> > I brought the cluster up, all daemons started and all but 2 PGs are
>> active+clean
>> >
>> > I have 2 pgs showing: "active+recovering+degraded"
>> >
>> > It's been reporting this for about an hour with no signs of clearing on
>> it's own
>> >
>> > Ceph health detail shows: PG_DEGRADED Degraded data redundancy:
>> 2/131709267 objects degraded (0.000%), 2 pgs unclean, 2 pgs degraded
>> >
>> > I've tried restarting MONs and all OSDs in the cluster.
>> >
>> > How would you recommend I proceed at this point?
>> >
>> > Thanks
>> > David
>> >
>> >
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] active+recovering+degraded after cluster reboot

2018-12-15 Thread David C
Hi Paul

Thanks for the response. Not yet, just being a bit cautious ;) I'll go
ahead and do that.

Thanks
David


On Sat, 15 Dec 2018, 23:39 Paul Emmerich  Did you unset norecover?
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Sun, Dec 16, 2018 at 12:22 AM David C  wrote:
> >
> > Hi All
> >
> > I have what feels like a bit of a rookie question
> >
> > I shutdown a Luminous 12.2.1 cluster with noout,nobackfill,norecover set
> >
> > Before shutting down, all PGs were active+clean
> >
> > I brought the cluster up, all daemons started and all but 2 PGs are
> active+clean
> >
> > I have 2 pgs showing: "active+recovering+degraded"
> >
> > It's been reporting this for about an hour with no signs of clearing on
> it's own
> >
> > Ceph health detail shows: PG_DEGRADED Degraded data redundancy:
> 2/131709267 objects degraded (0.000%), 2 pgs unclean, 2 pgs degraded
> >
> > I've tried restarting MONs and all OSDs in the cluster.
> >
> > How would you recommend I proceed at this point?
> >
> > Thanks
> > David
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] active+recovering+degraded after cluster reboot

2018-12-15 Thread David C
Hi All

I have what feels like a bit of a rookie question

I shutdown a Luminous 12.2.1 cluster with noout,nobackfill,norecover set

Before shutting down, all PGs were active+clean

I brought the cluster up, all daemons started and all but 2 PGs are
active+clean

I have 2 pgs showing: "active+recovering+degraded"

It's been reporting this for about an hour with no signs of clearing on
it's own

Ceph health detail shows: PG_DEGRADED Degraded data redundancy: 2/131709267
objects degraded (0.000%), 2 pgs unclean, 2 pgs degraded

I've tried restarting MONs and all OSDs in the cluster.

How would you recommend I proceed at this point?

Thanks
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Why does "df" against a mounted cephfs report (vastly) different free space?

2018-12-12 Thread David Young
Hi all,

I have a cluster used exclusively for cephfs (A EC "media" pool, and a standard 
metadata pool for the cephfs).

"ceph -s" shows me:

---
  data:
pools:   2 pools, 260 pgs
objects: 37.18 M objects, 141 TiB
usage:   177 TiB used, 114 TiB / 291 TiB avail
pgs: 260 active+clean
---

But 'df' against the mounted cephfs shows me:

---
root@node1:~# df | grep ceph
Filesystem   1K-blocks UsedAvailable Use% Mounted on
10.20.30.1:6789:/ 151264890880 151116939264147951616 100% /ceph

root@node1:~# df -h | grep ceph
Filesystem Size  Used Avail Use% Mounted on
10.20.30.1:6789:/  141T  141T  142G 100% /ceph
root@node1:~#
---

And "rados df" shows me:

---
root@node1:~# rados df
POOL_NAME  USED  OBJECTS CLONESCOPIES MISSING_ON_PRIMARY UNFOUND 
DEGRADEDRD_OPS  RD   WR_OPS  WR
cephfs_metadata 173 MiB27239  0 54478  0   0
0   1102765 9.8 GiB  8810925  43 GiB
media   141 TiB 37152647  0 185763235  0   0
0 110377842 120 TiB 74835385 183 TiB

total_objects37179886
total_used   177 TiB
total_avail  114 TiB
total_space  291 TiB
root@node1:~#
---

The amount used that df represents seems accurate (141TB at 4+1 EC), but the 
amount of remaining space is baffling me. Have I hit a limitation due to the 
amount of PGs I created, or is remaining free space just being misr-reported by 
df/cephfs?

Thanks!
D___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-12-12 Thread David Galloway
Hey Dan,

Thanks for bringing this to our attention.  Looks like it did get left
out.  I just pushed the package and added a step to the release process
to make sure packages don't get skipped again like that.

- David

On 12/12/2018 11:03 AM, Dan van der Ster wrote:
> Hey Abhishek,
> 
> We just noticed that the debuginfo is missing for 12.2.10:
> http://download.ceph.com/rpm-luminous/el7/x86_64/ceph-debuginfo-12.2.10-0.el7.x86_64.rpm
> 
> Did something break in the publishing?
> 
> Cheers, Dan
> 
> On Tue, Nov 27, 2018 at 3:50 PM Abhishek Lekshmanan  wrote:
>>
>>
>> We're happy to announce the tenth bug fix release of the Luminous
>> v12.2.x long term stable release series. The previous release, v12.2.9,
>> introduced the PG hard-limit patches which were found to cause an issue
>> in certain upgrade scenarios, and this release was expedited to revert
>> those patches. If you already successfully upgraded to v12.2.9, you
>> should **not** upgrade to v12.2.10, but rather **wait** for a release in
>> which http://tracker.ceph.com/issues/36686 is addressed. All other users
>> are encouraged to upgrade to this release.
>>
>> Notable Changes
>> ---
>>
>> * This release reverts the PG hard-limit patches added in v12.2.9 in which,
>>   a partial upgrade during a recovery/backfill, can cause the osds on the
>>   previous version, to fail with assert(trim_to <= info.last_complete). The
>>   workaround for users is to upgrade and restart all OSDs to a version with 
>> the
>>   pg hard limit, or only upgrade when all PGs are active+clean.
>>
>>   See also: http://tracker.ceph.com/issues/36686
>>
>>   As mentioned above if you've successfully upgraded to v12.2.9 DO NOT
>>   upgrade to v12.2.10 until the linked tracker issue has been fixed.
>>
>> * The bluestore_cache_* options are no longer needed. They are replaced
>>   by osd_memory_target, defaulting to 4GB. BlueStore will expand
>>   and contract its cache to attempt to stay within this
>>   limit. Users upgrading should note this is a higher default
>>   than the previous bluestore_cache_size of 1GB, so OSDs using
>>   BlueStore will use more memory by default.
>>
>>   For more details, see BlueStore docs[1]
>>
>>
>> For the complete release notes with changelog, please check out the
>> release blog entry at:
>> http://ceph.com/releases/v12-2-10-luminous-released
>>
>> Getting ceph:
>> 
>> * Git at git://github.com/ceph/ceph.git
>> * Tarball at http://download.ceph.com/tarballs/ceph-12.2.10.tar.gz
>> * For packages, see http://docs.ceph.com/docs/master/install/get-packages/
>> * Release git sha1: 177915764b752804194937482a39e95e0ca3de94
>>
>>
>> [1]: 
>> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#cache-size
>>
>> --
>> Abhishek Lekshmanan
>> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
>> HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deploying an Active/Active NFS Cluster over CephFS

2018-12-12 Thread David C
Hi Jeff

Many thanks for this! Looking forward to testing it out.

Could you elaborate a bit on why Nautilus is recommended for this set-up
please. Would attempting this with a Luminous cluster be a non-starter?



On Wed, 12 Dec 2018, 12:16 Jeff Layton  (Sorry for the duplicate email to ganesha lists, but I wanted to widen
> it to include the ceph lists)
>
> In response to some cries for help over IRC, I wrote up this blog post
> the other day, which discusses how to set up parallel serving over
> CephFS:
>
>
> https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/
>
> Feel free to comment if you have questions. We may be want to eventually
> turn this into a document in the ganesha or ceph trees as well.
>
> Cheers!
> --
> Jeff Layton 
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lost 1/40 OSDs at EC 4+1, now PGs are incomplete

2018-12-11 Thread David Young
(accidentally forgot to reply to the list)

> Thank you, setting min_size to 4 allowed I/O again, and the 39 incomplete PGs 
> are now:
>
> 39  active+undersized+degraded+remapped+backfilling
>
> Once backfilling is done, I'll increase min_size to 5 again.
>
> Am I likely to encounter this issue whenever I loose an OSD (I/O freezes and 
> manually reducing size is required), and is there anything I should be doing 
> differently?
>
> Thanks again!
> D
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Wednesday, December 12, 2018 3:31 PM, Ashley Merrick 
>  wrote:
>
>> With EC the min size is set to K + 1.
>>
>> Generally EC is used with a M of 2 or more, reason min size is set to 1 is 
>> now you are in a state when a further OSD loss will cause some PG’s to not 
>> have at least K size available as you only have 1 extra M.
>>
>> As per the error you can get your pool back online by setting min_size to 4.
>>
>> However this would only be a temp fix while you get the OSD back online / 
>> rebuilt so you can go back to your 4 + 1 state.
>>
>> ,Ash
>>
>> On Wed, 12 Dec 2018 at 10:27 AM, David Young  
>> wrote:
>>
>>> Hi all,
>>>
>>> I have a small 2-node cluster with 40 OSDs, using erasure coding 4+1
>>>
>>> I lost osd38, and now I have 39 incomplete PGs.
>>>
>>> ---
>>> PG_AVAILABILITY Reduced data availability: 39 pgs inactive, 39 pgs 
>>> incomplete
>>> pg 22.2 is incomplete, acting [19,33,10,8,29] (reducing pool media 
>>> min_size from 5 may help; search ceph.com/docs for 'incomplete')
>>> pg 22.f is incomplete, acting [17,9,23,14,15] (reducing pool media  
>>> from 5 may help; search ceph.com/docs for 'incomplete')
>>> pg 22.12 is incomplete, acting [7,33,10,31,29] (reducing pool media 
>>> min_size from 5 may help; search ceph.com/docs for 'incomplete')
>>> pg 22.13 is incomplete, acting [23,0,15,33,13] (reducing pool media 
>>> min_size from 5 may help; search ceph.com/docs for 'incomplete')
>>> pg 22.23 is incomplete, acting [29,17,18,15,12] (reducing pool media 
>>> min_size from 5 may help; search ceph.com/docs for 'incomplete')
>>> 
>>> ---
>>>
>>> My EC profile is below:
>>>
>>> ---
>>> root@prod1:~# ceph osd erasure-code-profile get ec-41-profile
>>> crush-device-class=
>>> crush-failure-domain=osd
>>> crush-root=default
>>> jerasure-per-chunk-alignment=false
>>> k=4
>>> m=1
>>> plugin=jerasure
>>> technique=reed_sol_van
>>> w=8
>>> ---
>>>
>>> When I query one of the incomplete PGs, I see this:
>>>
>>> ---
>>> "recovery_state": [
>>> {
>>> "name": "Started/Primary/Peering/Incomplete",
>>> "enter_time": "2018-12-11 20:46:11.645796",
>>> "comment": "not enough complete instances of this PG"
>>> },
>>> ---
>>>
>>> And this:
>>>
>>> ---
>>> "probing_osds": [
>>> "0(4)",
>>> "7(2)",
>>> "9(1)",
>>> "11(4)",
>>> "22(3)",
>>> "29(2)",
>>> "36(0)"
>>> ],
>>> "down_osds_we_would_probe": [
>>> 38
>>> ],
>>> "peering_blocked_by": []
>>> },
>>> ---
>>>
>>> I have set this in /etc/ceph/ceph.conf to no effect:
>>>osd_find_best_info_ignore_history_les = true
>>>
>>> As a result of the incomplete PGs, I/O is currently frozen to at last part 
>>> of my cephfs.
>>>
>>> I expected to be able to tolerate the loss of an OSD without issue, is 
>>> there anything I can do to restore these incomplete PGs?
>>>
>>> When I bring back a new osd38, I see:
>>> ---
>>> "probing_osds": [
>>> "4(2)",
>>> "11(3)",
>>> "22(1)",
>>> "24(1)",
>>> "26(2)",
>>> "36(4)",
>>> "38(1)",
>>> "39(0)"
>>> ],
>>> "down_osds_we_would_probe": [],
>>> "peering_blocked_by": []
>>> },
>>> {
>>> "name": "Started",
>>> "enter_time": "2018-12-11 21:06:35.307379"
>>> }
>>> ---
>>>
>>> But my recovery state is still:
>>>
>>> ---
>>> "recovery_state": [
>>> {
>>> "name": "Started/Primary/Peering/Incomplete",
>>> "enter_time": "2018-12-11 21:06:35.320292",
>>> "comment": "not enough complete instances of this PG"
>>> },
>>> ---
>>>
>>> Any ideas?
>>>
>>> Thanks!
>>> D
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Lost 1/40 OSDs at EC 4+1, now PGs are incomplete

2018-12-11 Thread David Young
Hi all,

I have a small 2-node cluster with 40 OSDs, using erasure coding 4+1

I lost osd38, and now I have 39 incomplete PGs.

---
PG_AVAILABILITY Reduced data availability: 39 pgs inactive, 39 pgs incomplete
pg 22.2 is incomplete, acting [19,33,10,8,29] (reducing pool media min_size 
from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.f is incomplete, acting [17,9,23,14,15] (reducing pool media min_size 
from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.12 is incomplete, acting [7,33,10,31,29] (reducing pool media 
min_size from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.13 is incomplete, acting [23,0,15,33,13] (reducing pool media 
min_size from 5 may help; search ceph.com/docs for 'incomplete')
pg 22.23 is incomplete, acting [29,17,18,15,12] (reducing pool media 
min_size from 5 may help; search ceph.com/docs for 'incomplete')

---

My EC profile is below:

---
root@prod1:~# ceph osd erasure-code-profile get ec-41-profile
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=1
plugin=jerasure
technique=reed_sol_van
w=8
---

When I query one of the incomplete PGs, I see this:

---
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-11 20:46:11.645796",
"comment": "not enough complete instances of this PG"
},
---

And this:

---
"probing_osds": [
"0(4)",
"7(2)",
"9(1)",
"11(4)",
"22(3)",
"29(2)",
"36(0)"
],
"down_osds_we_would_probe": [
38
],
"peering_blocked_by": []
},
---

I have set this in /etc/ceph/ceph.conf to no effect:
   osd_find_best_info_ignore_history_les = true

As a result of the incomplete PGs, I/O is currently frozen to at last part of 
my cephfs.

I expected to be able to tolerate the loss of an OSD without issue, is there 
anything I can do to restore these incomplete PGs?

When I bring back a new osd38, I see:
---
"probing_osds": [
"4(2)",
"11(3)",
"22(1)",
"24(1)",
"26(2)",
"36(4)",
"38(1)",
"39(0)"
],
"down_osds_we_would_probe": [],
"peering_blocked_by": []
},
{
"name": "Started",
"enter_time": "2018-12-11 21:06:35.307379"
}
---

But my recovery state is still:

---
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-11 21:06:35.320292",
"comment": "not enough complete instances of this PG"
},
---

Any ideas?

Thanks!
D___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH DR RBD Mount

2018-11-30 Thread David C
Is that one big xfs filesystem? Are you able to mount with krbd?

On Tue, 27 Nov 2018, 13:49 Vikas Rana  Hi There,
>
> We are replicating a 100TB RBD image to DR site. Replication works fine.
>
> rbd --cluster cephdr mirror pool status nfs --verbose
>
> health: OK
>
> images: 1 total
>
> 1 replaying
>
>
>
> dir_research:
>
>   global_id:   11e9cbb9-ce83-4e5e-a7fb-472af866ca2d
>
>   state:   up+replaying
>
>   description: replaying, master_position=[object_number=591701,
> tag_tid=1, entry_tid=902879873], mirror_position=[object_number=446354,
> tag_tid=1, entry_tid=727653146], entries_behind_master=175226727
>
>   last_update: 2018-11-14 16:17:23
>
>
>
>
> We then, use nbd to map the RBD image at the DR site but when we try to
> mount it, we get
>
>
> # mount /dev/nbd2 /mnt
>
> mount: block device /dev/nbd2 is write-protected, mounting read-only
>
> *mount: /dev/nbd2: can't read superblock*
>
>
>
> We are using 12.2.8.
>
>
> Any help will be greatly appreciated.
>
>
> Thanks,
>
> -Vikas
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate OSD journal to SSD partition

2018-11-19 Thread David Turner
For this the procedure is generally to stop the osd, flush the journal,
update the symlink on the osd to the new journal location, mkjournal, start
osd.  You shouldn't need to do anything in the ceph.conf file.

On Thu, Nov 8, 2018 at 2:41 AM  wrote:

> Hi all,
>
>
>
> I have been trying to migrate the journal to SSD partition for an while,
> basically I followed the guide here [1],  I have the below configuration
> defined in the ceph.conf
>
>
>
> [osd.0]
>
> osd_journal = /dev/disk/by-partlabel/journal-1
>
>
>
> And then create the journal in this way,
>
> # ceph-osd -i 0 –mkjournal
>
>
>
> After that, I started the osd,  and I saw the service is started
> successfully from the log print out on the console,
>
> 08 14:03:35 ceph1 ceph-osd[5111]: starting osd.0 at :/0 osd_data
> /var/lib/ceph/osd/ceph-0 /dev/disk/by-partlabel/journal-1
>
> 08 14:03:35 ceph1 ceph-osd[5111]: 2018-11-08 14:03:35.618247 7fe8b54b28c0
> -1 osd.0 766 log_to_monitors {default=true}
>
>
>
> But I not sure whether the new journal is effective or not, looks like it
> is still using the old partition (/dev/sdc2) for journal, and new partition
> which is actually “dev/sde1” has no information on the journal,
>
>
>
> # ceph-disk list
>
>
>
> /dev/sdc :
>
> /dev/sdc2 ceph journal, for /dev/sdc1
>
> /dev/sdc1 ceph data, active, cluster ceph, osd.0, journal /dev/sdc2
>
> /dev/sdd :
>
> /dev/sdd2 ceph journal, for /dev/sdd1
>
> /dev/sdd1 ceph data, active, cluster ceph, osd.1, journal /dev/sdd2
>
> /dev/sde :
>
> /dev/sde1 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
>
> /dev/sdf other, unknown
>
>
>
> # ls -l /var/lib/ceph/osd/ceph-0/journal
>
> lrwxrwxrwx 1 ceph ceph 58  21  2018 /var/lib/ceph/osd/ceph-0/journal ->
> /dev/disk/by-partuuid/5b5cd6f6-5de4-44f3-9d33-e8a7f4b59f61
>
>
>
> # ls -l /dev/disk/by-partuuid/5b5cd6f6-5de4-44f3-9d33-e8a7f4b59f61
>
> lrwxrwxrwx 1 root root 10 8 13:59
> /dev/disk/by-partuuid/5b5cd6f6-5de4-44f3-9d33-e8a7f4b59f61 -> ../../sdc2
>
>
>
>
>
> My question is how I know which partition is taking the role of journal?
> Where can I see the new journal partition is linked?
>
>
>
> Any comments is highly appreciated!
>
>
>
>
>
> [1] https://fatmin.com/2015/08/11/ceph-show-osd-to-journal-mapping/
>
>
>
>
>
> Best Regards,
>
> Dave Chen
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic - EC and crush rules - clarification

2018-11-16 Thread David Turner
The difference for 2+2 vs 2x replication isn't in the amount of space being
used or saved, but in the amount of OSDs you can safely lose without any
data loss or outages.  2x replication is generally considered very unsafe
for data integrity, but 2+2 would is as resilient as 3x replication while
only using as much space as 2x replication.

On Thu, Nov 1, 2018 at 11:25 PM Wladimir Mutel  wrote:

> David Turner wrote:
> > Yes, when creating an EC profile, it automatically creates a CRUSH rule
> > specific for that EC profile.  You are also correct that 2+1 doesn't
> > really have any resiliency built in.  2+2 would allow 1 node to go down
> > while still having your data accessible.  It will use 2x data to raw as
>
> Is not EC 2+2 the same as 2x replication (i.e. RAID1) ?
> Is not EC benefit and intention to allow equivalent replication
> factors be chosen between >1 and <2 ?
> That's why it is recommended to have m parameters. Because when you have m==k, it is equivalent to 2x
> replication, with m==2k - to 3x replication and so on.
> And correspondingly, with m==1 you have equivalent reliability
> of RAID5, with m==2 - that of RAID6, and you start to have more
> "interesting" reliability factors only when you could allow m>2
> and k>m. Overall, your reliability in Ceph is measured as a
> cluster rebuild/performance degradation time in case of
> up-to m OSDs failure, provided that no more than m OSDs
> (or larger failure domains) have failed at once.
> Sure, EC is beneficial only when you have enough failure domains
> (i.e. hosts). My criterion is that you should have more hosts
> than you have individual OSDs within a single host.
> I.e. at least 8 (and better >8) hosts when you have 8 OSDs
> per host.
>
> > opposed to the 1.5x of 2+1, but it gives you resiliency.  The example in
> > your command of 3+2 is not possible with your setup.  May I ask why you
> > want EC on such a small OSD count?  I'm guessing to not use as much
> > storage on your SSDs, but I would just suggest going with replica with
> > such a small cluster.  If you have a larger node/OSD count, then you can
> > start seeing if EC is right for your use case, but if this is production
> > data... I wouldn't risk it.
>
> > When setting the crush rule, it wants the name of it, ssdrule, not 2.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread David Turner
My big question is that we've had a few of these releases this year that
are bugged and shouldn't be upgraded to... They don't have any release
notes or announcement and the only time this comes out is when users
finally ask about it weeks later.  Why is this not proactively announced to
avoid a problematic release and hopefully prevent people from installing
it?  It would be great if there was an actual release notes saying not to
upgrade to this version or something.

On Wed, Nov 7, 2018 at 11:16 AM Ashley Merrick 
wrote:

> I am seeing this on the latest mimic on my test cluster aswel.
>
> Every automatic deep-scrub comes back as inconsistent, but doing another
> manual scrub comes back as fine and clear each time.
>
> Not sure if related or not..
>
> On Wed, 7 Nov 2018 at 11:57 PM, Christoph Adomeit <
> christoph.adom...@gatworks.de> wrote:
>
>> Hello together,
>>
>> we have upgraded to 12.2.9 because it was in the official repos.
>>
>> Right after the update and some scrubs we have issues.
>>
>> This morning after regular scrubs we had around 10% of all pgs inconstent:
>>
>> pgs: 4036 active+clean
>>   380  active+clean+inconsistent
>>
>> After repairung these 380 pgs we again have:
>>
>> 1/93611534 objects unfound (0.000%)
>> 28   active+clean+inconsistent
>> 1active+recovery_wait+degraded
>>
>> Now we stopped repairing because it does not seem to solve the problem
>> and more and more error messages are occuring. So far we did not see
>> corruption but we do not feel well with the cluster.
>>
>> What do you suggest, wait for 12.2.10 ? Roll Back to 12.2.8 ?
>>
>> Is ist dangerous for our Data to leave the cluster running ?
>>
>> I am sure we do not have hardware errors and that these errors came with
>> the update to 12.2.9.
>>
>> Thanks
>>   Christoph
>>
>>
>>
>> On Wed, Nov 07, 2018 at 07:39:59AM -0800, Gregory Farnum wrote:
>> > On Wed, Nov 7, 2018 at 5:58 AM Simon Ironside 
>> > wrote:
>> >
>> > >
>> > >
>> > > On 07/11/2018 10:59, Konstantin Shalygin wrote:
>> > > >> I wonder if there is any release announcement for ceph 12.2.9 that
>> I
>> > > missed.
>> > > >> I just found the new packages on download.ceph.com, is this an
>> official
>> > > >> release?
>> > > >
>> > > > This is because 12.2.9 have a several bugs. You should avoid to use
>> this
>> > > > release and wait for 12.2.10
>> > >
>> > > Argh! What's it doing in the repos then?? I've just upgraded to it!
>> > > What are the bugs? Is there a thread about them?
>> >
>> >
>> > If you’ve already upgraded and have no issues then you won’t have any
>> > trouble going forward — except perhaps on the next upgrade, if you do it
>> > while the cluster is unhealthy.
>> >
>> > I agree that it’s annoying when these issues make it out. We’ve had
>> ongoing
>> > discussions to try and improve the release process so it’s less
>> drawn-out
>> > and to prevent these upgrade issues from making it through testing, but
>> > nobody has resolved it yet. If anybody has experience working with deb
>> > repositories and handling releases, the Ceph upstream could use some
>> > help... ;)
>> > -Greg
>> >
>> >
>> > >
>> > > Simon
>> > > ___
>> > > ceph-users mailing list
>> > > ceph-users@lists.ceph.com
>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > >
>>
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-11-05 Thread David Turner
Correct, it's just the the ceph-kvstore-tool for Luminous doesn't have the
ability to migrate between them.  It exists in Jewel 10.2.11 and in Mimic,
but it doesn't exist in Luminous.  There's no structural difference in the
omap backend so I'm planning to just use a Mimic version of the tool to
update my omap backends.

On Mon, Nov 5, 2018 at 4:26 PM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> Not sure I understand that, but starting Luminous, the filestore omap
> backend is rocksdb by default.
>
>
>
> *From: *David Turner 
> *Date: *Monday, November 5, 2018 at 3:25 PM
>
>
> *To: *Pavan Rallabhandi 
> *Cc: *ceph-users 
> *Subject: *EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
>
>
> Digging into the code a little more, that functionality was added in
> 10.2.11 and 13.0.1, but it still isn't anywhere in the 12.x.x Luminous
> version.  That's so bizarre.
>
>
>
> On Sat, Nov 3, 2018 at 11:56 AM Pavan Rallabhandi <
> prallabha...@walmartlabs.com> wrote:
>
> Not exactly, this feature was supported in Jewel starting 10.2.11, ref
> https://github.com/ceph/ceph/pull/18010
>
>
>
> I thought you mentioned you were using Luminous 12.2.4.
>
>
>
> *From: *David Turner 
> *Date: *Friday, November 2, 2018 at 5:21 PM
>
>
> *To: *Pavan Rallabhandi 
> *Cc: *ceph-users 
> *Subject: *EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
>
>
> That makes so much more sense. It seems like RHCS had had this ability
> since Jewel while it was only put into the community version as of Mimic.
> So my version of the version isn't actually capable of changing the backend
> db. Whole digging into the coffee I did find a bug with the creation of the
> rocksdb backend created with ceph-kvstore-tool. It doesn't use the ceph
> defaults or any settings in your config file for the db settings. I'm
> working on testing a modified version that should take those settings into
> account. If the fix does work, the fix will be able to apply to a few other
> tools as well that can be used to set up the omap backend db.
>
>
>
> On Fri, Nov 2, 2018, 4:26 PM Pavan Rallabhandi <
> prallabha...@walmartlabs.com> wrote:
>
> It was Redhat versioned Jewel. But may be more relevantly, we are on
> Ubuntu unlike your case.
>
>
>
> *From: *David Turner 
> *Date: *Friday, November 2, 2018 at 10:24 AM
>
>
> *To: *Pavan Rallabhandi 
> *Cc: *ceph-users 
> *Subject: *EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
>
>
> Pavan, which version of Ceph were you using when you changed your backend
> to rocksdb?
>
>
>
> On Mon, Oct 1, 2018 at 4:24 PM Pavan Rallabhandi <
> prallabha...@walmartlabs.com> wrote:
>
> Yeah, I think this is something to do with the CentOS binaries, sorry that
> I couldn’t be of much help here.
>
> Thanks,
> -Pavan.
>
> From: David Turner 
> Date: Monday, October 1, 2018 at 1:37 PM
> To: Pavan Rallabhandi 
> Cc: ceph-users 
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> I tried modifying filestore_rocksdb_options
> by removing compression=kNoCompression as well as setting it
> to compression=kSnappyCompression.  Leaving it with kNoCompression or
> removing it results in the same segfault in the previous log.  Setting it
> to kSnappyCompression resulted in [1] this being logged and the OSD just
> failing to start instead of segfaulting.  Is there anything else you would
> suggest trying before I purge this OSD from the cluster?  I'm afraid it
> might be something with the CentOS binaries.
>
> [1] 2018-10-01 17:10:37.134930 7f1415dfcd80  0  set rocksdb option
> compression = kSnappyCompression
> 2018-10-01 17:10:37.134986 7f1415dfcd80 -1 rocksdb: Invalid argument:
> Compression type Snappy is not linked with the binary.
> 2018-10-01 17:10:37.135004 7f1415dfcd80 -1
> filestore(/var/lib/ceph/osd/ceph-1) mount(1723): Error initializing rocksdb
> :
> 2018-10-01 17:10:37.135020 7f1415dfcd80 -1 osd.1 0 OSD:init: unable to
> mount object store
> 2018-10-01 17:10:37.135029 7f1415dfcd80 -1 ESC[0;31m ** ERROR: osd init
> failed: (1) Operation not permittedESC[0m
>
> On Sat, Sep 29, 2018 at 1:57 PM Pavan Rallabhandi  prallabha...@walmartlabs.com> wrote:
> I looked at one of my test clusters running Jewel on Ubuntu 16.04, and
> interestingly I found this(below) in one of the OSD logs, which is
> different from your OSD boot log, where none of the compression algorithms
> seem to be supported. This hints more at how rocksdb was built on CentOS

Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-11-05 Thread David Turner
Digging into the code a little more, that functionality was added in
10.2.11 and 13.0.1, but it still isn't anywhere in the 12.x.x Luminous
version.  That's so bizarre.

On Sat, Nov 3, 2018 at 11:56 AM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> Not exactly, this feature was supported in Jewel starting 10.2.11, ref
> https://github.com/ceph/ceph/pull/18010
>
>
>
> I thought you mentioned you were using Luminous 12.2.4.
>
>
>
> *From: *David Turner 
> *Date: *Friday, November 2, 2018 at 5:21 PM
>
>
> *To: *Pavan Rallabhandi 
> *Cc: *ceph-users 
> *Subject: *EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
>
>
> That makes so much more sense. It seems like RHCS had had this ability
> since Jewel while it was only put into the community version as of Mimic.
> So my version of the version isn't actually capable of changing the backend
> db. Whole digging into the coffee I did find a bug with the creation of the
> rocksdb backend created with ceph-kvstore-tool. It doesn't use the ceph
> defaults or any settings in your config file for the db settings. I'm
> working on testing a modified version that should take those settings into
> account. If the fix does work, the fix will be able to apply to a few other
> tools as well that can be used to set up the omap backend db.
>
>
>
> On Fri, Nov 2, 2018, 4:26 PM Pavan Rallabhandi <
> prallabha...@walmartlabs.com> wrote:
>
> It was Redhat versioned Jewel. But may be more relevantly, we are on
> Ubuntu unlike your case.
>
>
>
> *From: *David Turner 
> *Date: *Friday, November 2, 2018 at 10:24 AM
>
>
> *To: *Pavan Rallabhandi 
> *Cc: *ceph-users 
> *Subject: *EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
>
>
> Pavan, which version of Ceph were you using when you changed your backend
> to rocksdb?
>
>
>
> On Mon, Oct 1, 2018 at 4:24 PM Pavan Rallabhandi <
> prallabha...@walmartlabs.com> wrote:
>
> Yeah, I think this is something to do with the CentOS binaries, sorry that
> I couldn’t be of much help here.
>
> Thanks,
> -Pavan.
>
> From: David Turner 
> Date: Monday, October 1, 2018 at 1:37 PM
> To: Pavan Rallabhandi 
> Cc: ceph-users 
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> I tried modifying filestore_rocksdb_options
> by removing compression=kNoCompression as well as setting it
> to compression=kSnappyCompression.  Leaving it with kNoCompression or
> removing it results in the same segfault in the previous log.  Setting it
> to kSnappyCompression resulted in [1] this being logged and the OSD just
> failing to start instead of segfaulting.  Is there anything else you would
> suggest trying before I purge this OSD from the cluster?  I'm afraid it
> might be something with the CentOS binaries.
>
> [1] 2018-10-01 17:10:37.134930 7f1415dfcd80  0  set rocksdb option
> compression = kSnappyCompression
> 2018-10-01 17:10:37.134986 7f1415dfcd80 -1 rocksdb: Invalid argument:
> Compression type Snappy is not linked with the binary.
> 2018-10-01 17:10:37.135004 7f1415dfcd80 -1
> filestore(/var/lib/ceph/osd/ceph-1) mount(1723): Error initializing rocksdb
> :
> 2018-10-01 17:10:37.135020 7f1415dfcd80 -1 osd.1 0 OSD:init: unable to
> mount object store
> 2018-10-01 17:10:37.135029 7f1415dfcd80 -1 ESC[0;31m ** ERROR: osd init
> failed: (1) Operation not permittedESC[0m
>
> On Sat, Sep 29, 2018 at 1:57 PM Pavan Rallabhandi  prallabha...@walmartlabs.com> wrote:
> I looked at one of my test clusters running Jewel on Ubuntu 16.04, and
> interestingly I found this(below) in one of the OSD logs, which is
> different from your OSD boot log, where none of the compression algorithms
> seem to be supported. This hints more at how rocksdb was built on CentOS
> for Ceph.
>
> 2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Compression algorithms
> supported:
> 2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Snappy supported: 1
> 2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb: Zlib supported: 1
> 2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb: Bzip supported: 0
> 2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb: LZ4 supported: 0
> 2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb: ZSTD supported: 0
> 2018-09-29 17:38:38.629115 7fbd318d4b00  4 rocksdb: Fast CRC32 supported: 0
>
> On 9/27/18, 2:56 PM, "Pavan Rallabhandi"  prallabha...@walmartlabs.com> wrote:
>
> I see Filestore symbols on the stack, so the bluestore config doesn’t
> affect. And the top frame of the stack hints at a RocksDB issue, 

Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-11-02 Thread David Turner
That makes so much more sense. It seems like RHCS had had this ability
since Jewel while it was only put into the community version as of Mimic.
So my version of the version isn't actually capable of changing the backend
db. Whole digging into the coffee I did find a bug with the creation of the
rocksdb backend created with ceph-kvstore-tool. It doesn't use the ceph
defaults or any settings in your config file for the db settings. I'm
working on testing a modified version that should take those settings into
account. If the fix does work, the fix will be able to apply to a few other
tools as well that can be used to set up the omap backend db.

On Fri, Nov 2, 2018, 4:26 PM Pavan Rallabhandi 
wrote:

> It was Redhat versioned Jewel. But may be more relevantly, we are on
> Ubuntu unlike your case.
>
>
>
> *From: *David Turner 
> *Date: *Friday, November 2, 2018 at 10:24 AM
>
>
> *To: *Pavan Rallabhandi 
> *Cc: *ceph-users 
> *Subject: *EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
>
>
> Pavan, which version of Ceph were you using when you changed your backend
> to rocksdb?
>
>
>
> On Mon, Oct 1, 2018 at 4:24 PM Pavan Rallabhandi <
> prallabha...@walmartlabs.com> wrote:
>
> Yeah, I think this is something to do with the CentOS binaries, sorry that
> I couldn’t be of much help here.
>
> Thanks,
> -Pavan.
>
> From: David Turner 
> Date: Monday, October 1, 2018 at 1:37 PM
> To: Pavan Rallabhandi 
> Cc: ceph-users 
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> I tried modifying filestore_rocksdb_options
> by removing compression=kNoCompression as well as setting it
> to compression=kSnappyCompression.  Leaving it with kNoCompression or
> removing it results in the same segfault in the previous log.  Setting it
> to kSnappyCompression resulted in [1] this being logged and the OSD just
> failing to start instead of segfaulting.  Is there anything else you would
> suggest trying before I purge this OSD from the cluster?  I'm afraid it
> might be something with the CentOS binaries.
>
> [1] 2018-10-01 17:10:37.134930 7f1415dfcd80  0  set rocksdb option
> compression = kSnappyCompression
> 2018-10-01 17:10:37.134986 7f1415dfcd80 -1 rocksdb: Invalid argument:
> Compression type Snappy is not linked with the binary.
> 2018-10-01 17:10:37.135004 7f1415dfcd80 -1
> filestore(/var/lib/ceph/osd/ceph-1) mount(1723): Error initializing rocksdb
> :
> 2018-10-01 17:10:37.135020 7f1415dfcd80 -1 osd.1 0 OSD:init: unable to
> mount object store
> 2018-10-01 17:10:37.135029 7f1415dfcd80 -1 ESC[0;31m ** ERROR: osd init
> failed: (1) Operation not permittedESC[0m
>
> On Sat, Sep 29, 2018 at 1:57 PM Pavan Rallabhandi  prallabha...@walmartlabs.com> wrote:
> I looked at one of my test clusters running Jewel on Ubuntu 16.04, and
> interestingly I found this(below) in one of the OSD logs, which is
> different from your OSD boot log, where none of the compression algorithms
> seem to be supported. This hints more at how rocksdb was built on CentOS
> for Ceph.
>
> 2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Compression algorithms
> supported:
> 2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Snappy supported: 1
> 2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb: Zlib supported: 1
> 2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb: Bzip supported: 0
> 2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb: LZ4 supported: 0
> 2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb: ZSTD supported: 0
> 2018-09-29 17:38:38.629115 7fbd318d4b00  4 rocksdb: Fast CRC32 supported: 0
>
> On 9/27/18, 2:56 PM, "Pavan Rallabhandi"  prallabha...@walmartlabs.com> wrote:
>
> I see Filestore symbols on the stack, so the bluestore config doesn’t
> affect. And the top frame of the stack hints at a RocksDB issue, and there
> are a whole lot of these too:
>
> “2018-09-17 19:23:06.480258 7f1f3d2a7700  2 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/table/block_based_table_reader.cc:636]
> Cannot find Properties block from file.”
>
> It really seems to be something with RocksDB on centOS. I still think
> you can try removing “compression=kNoCompression” from the
> filestore_rocksdb_options And/Or check if rocksdb is expecting snappy to be
> enabled.
>
> Thanks,
> -Pavan.
>
> From: David Turner <mailto:drakonst...@gmail.com>
> Date: Thursday, September 27, 2018 at 1:18 PM
> To: Pavan Rallabhandi <mailto:prallabha...@walma

Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-11-02 Thread David Turner
Pavan, which version of Ceph were you using when you changed your backend
to rocksdb?

On Mon, Oct 1, 2018 at 4:24 PM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> Yeah, I think this is something to do with the CentOS binaries, sorry that
> I couldn’t be of much help here.
>
> Thanks,
> -Pavan.
>
> From: David Turner 
> Date: Monday, October 1, 2018 at 1:37 PM
> To: Pavan Rallabhandi 
> Cc: ceph-users 
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> I tried modifying filestore_rocksdb_options
> by removing compression=kNoCompression as well as setting it
> to compression=kSnappyCompression.  Leaving it with kNoCompression or
> removing it results in the same segfault in the previous log.  Setting it
> to kSnappyCompression resulted in [1] this being logged and the OSD just
> failing to start instead of segfaulting.  Is there anything else you would
> suggest trying before I purge this OSD from the cluster?  I'm afraid it
> might be something with the CentOS binaries.
>
> [1] 2018-10-01 17:10:37.134930 7f1415dfcd80  0  set rocksdb option
> compression = kSnappyCompression
> 2018-10-01 17:10:37.134986 7f1415dfcd80 -1 rocksdb: Invalid argument:
> Compression type Snappy is not linked with the binary.
> 2018-10-01 17:10:37.135004 7f1415dfcd80 -1
> filestore(/var/lib/ceph/osd/ceph-1) mount(1723): Error initializing rocksdb
> :
> 2018-10-01 17:10:37.135020 7f1415dfcd80 -1 osd.1 0 OSD:init: unable to
> mount object store
> 2018-10-01 17:10:37.135029 7f1415dfcd80 -1 ESC[0;31m ** ERROR: osd init
> failed: (1) Operation not permittedESC[0m
>
> On Sat, Sep 29, 2018 at 1:57 PM Pavan Rallabhandi  prallabha...@walmartlabs.com> wrote:
> I looked at one of my test clusters running Jewel on Ubuntu 16.04, and
> interestingly I found this(below) in one of the OSD logs, which is
> different from your OSD boot log, where none of the compression algorithms
> seem to be supported. This hints more at how rocksdb was built on CentOS
> for Ceph.
>
> 2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Compression algorithms
> supported:
> 2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Snappy supported: 1
> 2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb: Zlib supported: 1
> 2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb: Bzip supported: 0
> 2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb: LZ4 supported: 0
> 2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb: ZSTD supported: 0
> 2018-09-29 17:38:38.629115 7fbd318d4b00  4 rocksdb: Fast CRC32 supported: 0
>
> On 9/27/18, 2:56 PM, "Pavan Rallabhandi"  prallabha...@walmartlabs.com> wrote:
>
> I see Filestore symbols on the stack, so the bluestore config doesn’t
> affect. And the top frame of the stack hints at a RocksDB issue, and there
> are a whole lot of these too:
>
> “2018-09-17 19:23:06.480258 7f1f3d2a7700  2 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/table/block_based_table_reader.cc:636]
> Cannot find Properties block from file.”
>
> It really seems to be something with RocksDB on centOS. I still think
> you can try removing “compression=kNoCompression” from the
> filestore_rocksdb_options And/Or check if rocksdb is expecting snappy to be
> enabled.
>
> Thanks,
> -Pavan.
>
> From: David Turner <mailto:drakonst...@gmail.com>
> Date: Thursday, September 27, 2018 at 1:18 PM
> To: Pavan Rallabhandi <mailto:prallabha...@walmartlabs.com>
> Cc: ceph-users <mailto:ceph-users@lists.ceph.com>
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> I got pulled away from this for a while.  The error in the log is
> "abort: Corruption: Snappy not supported or corrupted Snappy compressed
> block contents" and the OSD has 2 settings set to snappy by default,
> async_compressor_type and bluestore_compression_algorithm.  Do either of
> these settings affect the omap store?
>
> On Wed, Sep 19, 2018 at 2:33 PM Pavan Rallabhandi <mailto:mailto:
> prallabha...@walmartlabs.com> wrote:
> Looks like you are running on CentOS, fwiw. We’ve successfully ran the
> conversion commands on Jewel, Ubuntu 16.04.
>
> Have a feel it’s expecting the compression to be enabled, can you try
> removing “compression=kNoCompression” from the filestore_rocksdb_options?
> And/or you might want to check if rocksdb is expecting snappy to be enabled.
>
> From: David Turner <mailto:mailto:drakonst...@gmail.com>

Re: [ceph-users] Mimic - EC and crush rules - clarification

2018-11-01 Thread David Turner
Yes, when creating an EC profile, it automatically creates a CRUSH rule
specific for that EC profile.  You are also correct that 2+1 doesn't really
have any resiliency built in.  2+2 would allow 1 node to go down while
still having your data accessible.  It will use 2x data to raw as opposed
to the 1.5x of 2+1, but it gives you resiliency.  The example in your
command of 3+2 is not possible with your setup.  May I ask why you want EC
on such a small OSD count?  I'm guessing to not use as much storage on your
SSDs, but I would just suggest going with replica with such a small
cluster.  If you have a larger node/OSD count, then you can start seeing if
EC is right for your use case, but if this is production data... I wouldn't
risk it.

When setting the crush rule, it wants the name of it, ssdrule, not 2.

On Thu, Nov 1, 2018 at 1:34 PM Steven Vacaroaia  wrote:

> Hi,
>
> I am trying to create an EC pool on my SSD based OSDs
> and will appreciate if someone clarify / provide advice about the following
>
> - best K + M combination for 4 hosts one OSD per host
>   My understanding is that K+M< OSD but using K=2, M=1 does not provide
> any redundancy
>   ( as soon as 1 OSD is down, you cannot write to the pool)
>   Am I right ?
>
> - assigning crush_rule as per documentation does not seem to work
> If I provide all the crush rule details when I create the EC profile, the
> PGs are being placed on SSD OSDs  AND a crush rule is automatically create
> Is that the right/new way of doing it ?
> EXAMPLE
> ceph osd erasure-code-profile set erasureISA crush-failure-domain=osd k=3
> m=2 crush-root=ssds plugin=isa technique=cauchy crush-device-class=ssd
>
>
>  [root@osd01 ~]#  ceph osd crush rule ls
> replicated_rule
> erasure-code
> ssdrule
> [root@osd01 ~]# ceph osd crush rule dump ssdrule
> {
> "rule_id": 2,
> "rule_name": "ssdrule",
> "ruleset": 2,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -4,
> "item_name": "ssds"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> }
>
> [root@osd01 ~]# ceph osd pool set test crush_rule 2
> Error ENOENT: crush rule 2 does not exist
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread David Turner
What version of qemu-img are you using?  I found [1] this when poking
around on my qemu server when checking for rbd support.  This version (note
it's proxmox) has rbd listed as a supported format.

[1]
# qemu-img -V; qemu-img --help|grep rbd
qemu-img version 2.11.2pve-qemu-kvm_2.11.2-1
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
Supported formats: blkdebug blkreplay blkverify bochs cloop dmg file ftp
ftps gluster host_cdrom host_device http https iscsi iser luks nbd null-aio
null-co parallels qcow qcow2 qed quorum raw rbd replication sheepdog
throttle vdi vhdx vmdk vpc vvfat zeroinit
On Tue, Oct 30, 2018 at 12:08 PM Kevin Olbrich  wrote:

> Is it possible to use qemu-img with rbd support on Debian Stretch?
> I am on Luminous and try to connect my image-buildserver to load images
> into a ceph pool.
>
> root@buildserver:~# qemu-img convert -p -O raw /target/test-vm.qcow2
>> rbd:rbd_vms_ssd_01/test_vm
>> qemu-img: Unknown protocol 'rbd'
>
>
> Kevin
>
> Am Mo., 3. Sep. 2018 um 12:07 Uhr schrieb Abhishek Lekshmanan <
> abhis...@suse.com>:
>
>> arad...@tma-0.net writes:
>>
>> > Can anyone confirm if the Ceph repos for Debian/Ubuntu contain packages
>> for
>> > Debian? I'm not seeing any, but maybe I'm missing something...
>> >
>> > I'm seeing ceph-deploy install an older version of ceph on the nodes
>> (from the
>> > Debian repo) and then failing when I run "ceph-deploy osd ..." because
>> ceph-
>> > volume doesn't exist on the nodes.
>> >
>> The newer versions of Ceph (from mimic onwards) requires compiler
>> toolchains supporting c++17 which we unfortunately do not have for
>> stretch/jessie yet.
>>
>> -
>> Abhishek
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread David Turner
>From the balancer module's code for v 12.2.7 I noticed [1] these lines
which reference [2] these 2 config options for upmap. You might try using
more max iterations or a smaller max deviation to see if you can get a
better balance in your cluster. I would try to start with [3] these
commands/values and see if it improves your balance and/or allows you to
generate a better map.

[1]
https://github.com/ceph/ceph/blob/v12.2.7/src/pybind/mgr/balancer/module.py#L671-L672
[2] upmap_max_iterations (default 10)
upmap_max_deviation (default .01)
[3] ceph config-key set mgr/balancer/upmap_max_iterations 50
ceph config-key set mgr/balancer/upmap_max_deviation .005

On Tue, Oct 30, 2018 at 11:14 AM Steve Taylor 
wrote:

> I have a Luminous 12.2.7 cluster with 2 EC pools, both using k=8 and
> m=2. Each pool lives on 20 dedicated OSD hosts with 18 OSDs each. Each
> pool has 2048 PGs and is distributed across its 360 OSDs with host
> failure domains. The OSDs are identical (4TB) and are weighted with
> default weights (3.73).
>
> Initially, and not surprisingly, the PG distribution was all over the
> place with PG counts per OSD ranging from 40 to 83. I enabled the
> balancer module in upmap mode and let it work its magic, which reduced
> the range of the per-OSD PG counts to 56-61.
>
> While 56-61 is obviously a whole lot better than 40-83, with upmap I
> expected the range to be 56-57. If I run 'ceph balancer optimize
> ' again to attempt to create a new plan I get 'Error EALREADY:
> Unable to find further optimization,or distribution is already
> perfect.' I set the balancer's max_misplaced value to 1 in case that
> was preventing further optimization, but I still get the same error.
>
> I'm sure I'm missing some config option or something that will allow it
> to do better, but thus far I haven't been able to find anything in the
> docs, mailing list archives, or balancer source code that helps. Any
> ideas?
>
>
> Steve Taylor | Senior Software Engineer | StorageCraft Technology
> Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 <(801)%20871-2799> |
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD node reinstallation

2018-10-30 Thread David Turner
Basically it's a good idea to backup your /etc/ceph/ folder to reinstall
the node. Most everything you need will be in there for your osds.

On Tue, Oct 30, 2018, 6:01 AM Luiz Gustavo Tonello <
gustavo.tone...@gmail.com> wrote:

> Thank you guys,
>
> It'll save me a bunch of time, because the process to reallocate OSD files
> is not so fast. :-)
>
>
>
> On Tue, Oct 30, 2018 at 6:15 AM Alexandru Cucu  wrote:
>
>> Don't forget about the cephx keyring if you are using cephx ;)
>>
>> Usually sits in:
>> /var/lib/ceph/bootstrap-osd/ceph.keyring
>>
>> ---
>> Alex
>>
>> On Tue, Oct 30, 2018 at 4:48 AM David Turner 
>> wrote:
>> >
>> > Set noout, reinstall the OS without going the OSDs (including any
>> journal partitions and maintaining any dmcrypt keys if you have
>> encryption), install ceph, make sure the ceph.conf file is correct,zip
>> start OSDs, unset noout once they're back up and in. All of the data the
>> OSD needs to start is on the OSD itself.
>> >
>> > On Mon, Oct 29, 2018, 6:52 PM Luiz Gustavo Tonello <
>> gustavo.tone...@gmail.com> wrote:
>> >>
>> >> Hi list,
>> >>
>> >> I have a situation that I need to reinstall the O.S. of a single node
>> in my OSD cluster.
>> >> This node has 4 OSDs configured, each one has ~4 TB used.
>> >>
>> >> The way that I'm thinking to proceed is to put OSD down (one each
>> time), stop the OSD, reinstall the O.S., and finally add the OSDs again.
>> >>
>> >> But I want to know if there's a way to do this in a more simple
>> process, maybe put OSD in maintenance (noout), reinstall the O.S. without
>> formatting my Storage volumes, install CEPH again and enable OSDs again.
>> >>
>> >> There's a way like these?
>> >>
>> >> I'm running CEPH Jewel.
>> >>
>> >> Best,
>> >> --
>> >> Luiz Gustavo P Tonello.
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> Luiz Gustavo P Tonello.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD node reinstallation

2018-10-29 Thread David Turner
Set noout, reinstall the OS without going the OSDs (including any journal
partitions and maintaining any dmcrypt keys if you have encryption),
install ceph, make sure the ceph.conf file is correct,zip start OSDs, unset
noout once they're back up and in. All of the data the OSD needs to start
is on the OSD itself.

On Mon, Oct 29, 2018, 6:52 PM Luiz Gustavo Tonello <
gustavo.tone...@gmail.com> wrote:

> Hi list,
>
> I have a situation that I need to reinstall the O.S. of a single node in
> my OSD cluster.
> This node has 4 OSDs configured, each one has ~4 TB used.
>
> The way that I'm thinking to proceed is to put OSD down (one each time),
> stop the OSD, reinstall the O.S., and finally add the OSDs again.
>
> But I want to know if there's a way to do this in a more simple process,
> maybe put OSD in maintenance (noout), reinstall the O.S. without formatting
> my Storage volumes, install CEPH again and enable OSDs again.
>
> There's a way like these?
>
> I'm running CEPH Jewel.
>
> Best,
> --
> Luiz Gustavo P Tonello.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reducing min_size on erasure coded pool may allow recovery ?

2018-10-29 Thread David Turner
min_size should be at least k+1 for EC. There are times to use k for
emergencies like you had. I would suggest seeing it back to 3 once your
back to healthy.

As far as why you needed to reduce min_size, my guess would be that
recovery would have happened as long as k copies were up. Were the PG's
refusing to backfill or just hang backfilled yet?

On Mon, Oct 29, 2018, 9:24 PM Chad W Seys  wrote:

> Hi all,
>Recently our cluster lost a drive and a node (3 drives) at the same
> time.  Our erasure coded pools are all k2m2, so if all is working
> correctly no data is lost.
>However, there were 4 PGs that stayed "incomplete" until I finally
> took the suggestion in 'ceph health detail' to reduce min_size . (Thanks
> for the hint!)  I'm not sure what it was (likely 3), but setting it to 2
> caused all PGs to become active (though degraded) and the cluster is on
> path to recovering fully.
>
>In replicated pools, would not ceph create replicas without the need
> to reduce min_size?  It seems odd to not recover automatically if
> possible.  Could someone explain what was going on there?
>
>Also, how to decide what min_size should be?
>
> Thanks!
> Chad.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need advise on proper cluster reweighing

2018-10-28 Thread David Turner
Which version of Ceph are you running? Do you have any kernel clients? If
yes, can which version kernel? These questions are all leading to see if
you can enable the Luminous/Mimic mgr module balancer with upmap. If you
can, it is hands down the best way to balance your cluster.

On Sat, Oct 27, 2018, 9:14 PM Alex Litvak 
wrote:

> I have a cluster using 2 roots.  I attempted to reweigh osds under the
> "default" root used by pool rbd, cephfs-data, cephfs-meta using Cern
> script: crush-reweight-by-utilization.py.  I ran it first and it showed
> 4 candidates (per script default ), it shows final weight and single
> step movements.
>
>   ./crush-reweight-by-utilization.py --pool=rbd
> osd.36 (1.273109 >= 0.675607) [1.00 -> 0.99]
> osd.0 (1.243042 >= 0.675607) [1.00 -> 0.99]
> osd.2 (1.231539 >= 0.675607) [1.00 -> 0.99]
> osd.19 (1.228613 >= 0.675607) [1.00 -> 0.99]
>
> Script advises on all osds in the pool (36 of them if mentioned, see
> below).  Is it safe to take osd.36 as only one osd and reweigh it first?
> I attempted to do it and each step caused some more pgs stuck in
> active+unmapped mode.  I didn't proceed to the end at the moment, but if
> I do continue with osd.36 should pgs distribute correctly or my
> assumption is wrong?  Should I use some other approach, i.e. reweighing
> all osds in the pool or recalculating the weights completely?
>
> This is my first attempt to re-balance cluster properly so any clues are
> appreciated.
>
> Below are various diagnostics in anticipation of questions.
>
> Thank you in advance
>
> ./crush-reweight-by-utilization.py --pool=rbd --num-osds=36
> osd.36 (1.273079 >= 0.675594) [1.00 -> 0.99]
> osd.0 (1.243019 >= 0.675594) [1.00 -> 0.99]
> osd.2 (1.231513 >= 0.675594) [1.00 -> 0.99]
> osd.19 (1.228569 >= 0.675594) [1.00 -> 0.99]
> osd.16 (1.228071 >= 0.675594) [1.00 -> 0.99]
> osd.46 (1.220588 >= 0.675594) [1.00 -> 0.99]
> osd.23 (1.215887 >= 0.675594) [1.00 -> 0.99]
> osd.7 (1.204189 >= 0.675594) [1.00 -> 0.99]
> osd.10 (1.202385 >= 0.675594) [1.00 -> 0.99]
> osd.40 (1.186002 >= 0.675594) [1.00 -> 0.99]
> osd.43 (1.180218 >= 0.675594) [1.00 -> 0.99]
> osd.21 (1.180050 >= 0.675594) [1.00 -> 0.99]
> osd.15 (1.162953 >= 0.675594) [1.00 -> 0.99]
> osd.1 (1.155985 >= 0.675594) [1.00 -> 0.99]
> osd.44 (1.151496 >= 0.675594) [1.00 -> 0.99]
> osd.39 (1.149947 >= 0.675594) [1.00 -> 0.99]
> osd.22 (1.148013 >= 0.675594) [1.00 -> 0.99]
> osd.8 (1.143455 >= 0.675594) [1.00 -> 0.99]
> osd.37 (1.130054 >= 0.675594) [1.00 -> 0.99]
> osd.18 (1.126777 >= 0.675594) [1.00 -> 0.99]
> osd.17 (1.125752 >= 0.675594) [1.00 -> 0.99]
> osd.9 (1.124679 >= 0.675594) [1.00 -> 0.99]
> osd.42 (1.110069 >= 0.675594) [1.00 -> 0.99]
> osd.4 (1.108986 >= 0.675594) [1.00 -> 0.99]
> osd.45 (1.102144 >= 0.675594) [1.00 -> 0.99]
> osd.12 (1.085402 >= 0.675594) [1.00 -> 0.99]
> osd.38 (1.083698 >= 0.675594) [1.00 -> 0.99]
> osd.5 (1.076138 >= 0.675594) [1.00 -> 0.99]
> osd.11 (1.075955 >= 0.675594) [1.00 -> 0.99]
> osd.13 (1.070176 >= 0.675594) [1.00 -> 0.99]
> osd.20 (1.063759 >= 0.675594) [1.00 -> 0.99]
> osd.14 (1.052357 >= 0.675594) [1.00 -> 0.99]
> osd.41 (1.035255 >= 0.675594) [1.00 -> 0.99]
> osd.3 (1.013664 >= 0.675594) [1.00 -> 0.99]
> osd.47 (1.011428 >= 0.675594) [1.00 -> 0.99]
> osd.6 (1.000170 >= 0.675594) [1.00 -> 0.99]
>
> # ceph osd df tree
> ID  WEIGHT   REWEIGHT SIZE   USEAVAIL  %USE  VAR  TYPE NAME
>
> -10 18.0- 20100G  7127G 12973G 35.46 0.63 root 12g
>
>   -9 18.0- 20100G  7127G 12973G 35.46 0.63 datacenter
> la-12g
>   -5  6.0-  6700G  2375G  4324G 35.45 0.63 host
> oss4-la-12g
>   24  1.0  1.0  1116G   409G   706G 36.71 0.65
> osd.24
>   26  1.0  1.0  1116G   373G   743G 33.43 0.59
> osd.26
>   28  1.0  1.0  1116G   414G   702G 37.10 0.66
> osd.28
>   30  1.0  1.0  1116G   453G   663G 40.60 0.72
> osd.30
>   32  1.0  1.0  1116G   342G   774G 30.65 0.54
> osd.32
>   34  1.0  1.0  1116G   382G   734G 34.23 0.61
> osd.34
>   -6  6.0-  6700G  2375G  4324G 35.45 0.63 host
> oss5-la-12g
>   25  1.0  1.0  1116G   383G   733G 34.32 0.61
> osd.25
>   27  1.0  1.0  1116G   388G   728G 34.75 0.62
> osd.27
>   29  1.0  1.0  1116G   381G   734G 34.19 0.61
> osd.29
>   31  1.0  1.0  1116G   424G   692G 38.00 0.67
> osd.31
>   33  1.0  1.0  1116G   418G   698G 37.46 0.67
> osd.33
>   35  1.0  1.0  1116G   379G   736G 34.02 0.60
> osd.35
>   -7  6.0-  6700G  2376G  4323G 35.47 0.63 host
> oss6-la-12g
>   48  1.0  1.0  1116G   410G   705G 36.79 0.65
> osd.48
>   49  1.0  1.0  1116G 

Re: [ceph-users] Verifying the location of the wal

2018-10-28 Thread David Turner
If your had a specific location for the wal it would show up there. If
there is no entry for the wal, then it is using the same seeing as the db.

On Sun, Oct 28, 2018, 9:26 PM Robert Stanford 
wrote:

>
>  Mehmet: it doesn't look like wal is mentioned in the osd metadata.  I see
> bluefs slow, bluestore bdev, and bluefs db mentioned only.
>
> On Sun, Oct 28, 2018 at 1:48 PM  wrote:
>
>> IIRC there is a Command like
>>
>> Ceph osd Metadata
>>
>> Where you should be able to find Information like this
>>
>> Hab
>> - Mehmet
>>
>> Am 21. Oktober 2018 19:39:58 MESZ schrieb Robert Stanford <
>> rstanford8...@gmail.com>:
>>>
>>>
>>>  I did exactly this when creating my osds, and found that my total
>>> utilization is about the same as the sum of the utilization of the pools,
>>> plus (wal size * number osds).  So it looks like my wals are actually
>>> sharing OSDs.  But I'd like to be 100% sure... so I am seeking a way to
>>> find out
>>>
>>> On Sun, Oct 21, 2018 at 11:13 AM Serkan Çoban 
>>> wrote:
>>>
 wal and db device will be same if you use just db path during osd
 creation. i do not know how to verify this with ceph commands.
 On Sun, Oct 21, 2018 at 4:17 PM Robert Stanford <
 rstanford8...@gmail.com> wrote:
 >
 >
 >  Thanks Serkan.  I am using --path instead of --dev (dev won't work
 because I'm using VGs/LVs).  The output shows block and block.db, but
 nothing about wal.db.  How can I learn where my wal lives?
 >
 >
 >
 >
 > On Sun, Oct 21, 2018 at 12:43 AM Serkan Çoban 
 wrote:
 >>
 >> ceph-bluestore-tool can show you the disk labels.
 >> ceph-bluestore-tool show-label --dev /dev/sda1
 >> On Sun, Oct 21, 2018 at 1:29 AM Robert Stanford <
 rstanford8...@gmail.com> wrote:
 >> >
 >> >
 >> >  An email from this list stated that the wal would be created in
 the same place as the db, if the db were specified when running ceph-volume
 lvm create, and the db were specified on that command line.  I followed
 those instructions and like the other person writing to this list today, I
 was surprised to find that my cluster usage was higher than the total of
 pools (higher by an amount the same as all my wal sizes on each node
 combined).  This leads me to think my wal actually is on the data disk and
 not the ssd I specified the db should go to.
 >> >
 >> >  How can I verify which disk the wal is on, from the command
 line?  I've searched the net and not come up with anything.
 >> >
 >> >  Thanks and regards
 >> >  R
 >> >
 >> > ___
 >> > ceph-users mailing list
 >> > ceph-users@lists.ceph.com
 >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate/convert replicated pool to EC?

2018-10-26 Thread David Turner
It is indeed adding a placement target and not removing it replacing the
pool. The get/put wouldn't be a rados or even ceph command, you would do it
through an s3 client.

On Fri, Oct 26, 2018, 9:38 AM Matthew Vernon  wrote:

> Hi,
>
> On 26/10/2018 12:38, Alexandru Cucu wrote:
>
> > Have a look at this article:>
> https://ceph.com/geen-categorie/ceph-pool-migration/
>
> Thanks; that all looks pretty hairy especially for a large pool (ceph df
> says 1353T / 428,547,935 objects)...
>
> ...so something a bit more controlled/gradual and less
> manual-error-prone would make me happier!
>
> Regards,
>
> Matthew
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: move bucket from one placement to another

2018-10-25 Thread David Turner
Resharding a bucket won't affect the data in the bucket.  After you change
the placement for a bucket, you could update where the data is by
re-writing all of the data in the bucket.

On Thu, Oct 25, 2018 at 8:48 AM Jacek Suchenia 
wrote:

> Hi
>
> We have a bucket created with LocationConstraint setting, so
> explicit_placement entries are filled in a bucket. Is there a way to move
> it to other placement?
> I was thinking about editing that data and run manual resharding, but I
> don't know if it's a correct way of solving this problem.
>
> Jacek
>
> --
> Jacek Suchenia
> jacek.suche...@gmail.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate/convert replicated pool to EC?

2018-10-25 Thread David Turner
There are no tools to migrate in either direction between EC and Replica.
You can't even migrate an EC pool to a new EC profile.

With RGW you can create a new data pool and new objects will be written to
the new pool. If your objects have a lifecycle, then eventually you'll be
to the new pool over time. Otherwise you can get there by rewriting all of
the objects manually.

On Thu, Oct 25, 2018, 12:30 PM Matthew Vernon  wrote:

> Hi,
>
> I thought I'd seen that it was possible to migrate a replicated pool to
> being erasure-coded (but not the converse); but I'm failing to find
> anything that says _how_.
>
> Have I misremembered? Can you migrate a replicated pool to EC? (if so,
> how?)
>
> ...our use case is moving our S3 pool which is quite large, so if we can
> convert in-place that would be ideal...
>
> Thanks,
>
> Matthew
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE
> .
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive for Wal and Db

2018-10-22 Thread David Turner
I don't have enough disk space on the nvme. The DB would overflow before I
reached 25% utilization in the cluster. The disks are 10TB spinners and
would need a minimum of 100 GB of DB space minimum based on early testing.
The official docs recommend 400GB size DB for this size disk. I don't have
enough flash space for that in the 2x nvme disks in those servers.  Hence I
put the WAL soon the nvmes and left the DB on the data disk where it would
have spoiled over to almost immediately anyway.

On Mon, Oct 22, 2018, 6:55 PM solarflow99  wrote:

> Why didn't you just install the DB + WAL on the NVMe?  Is this "data disk"
> still an ssd?
>
>
>
> On Mon, Oct 22, 2018 at 3:34 PM David Turner 
> wrote:
>
>> And by the data disk I mean that I didn't specify a location for the DB
>> partition.
>>
>> On Mon, Oct 22, 2018 at 4:06 PM David Turner 
>> wrote:
>>
>>> Track down where it says they point to?  Does it match what you expect?
>>> It does for me.  I have my DB on my data disk and my WAL on a separate NVMe.
>>>
>>> On Mon, Oct 22, 2018 at 3:21 PM Robert Stanford 
>>> wrote:
>>>
>>>>
>>>>  David - is it ensured that wal and db both live where the symlink
>>>> block.db points?  I assumed that was a symlink for the db, but necessarily
>>>> for the wal, because it can live in a place different than the db.
>>>>
>>>> On Mon, Oct 22, 2018 at 2:18 PM David Turner 
>>>> wrote:
>>>>
>>>>> You can always just go to /var/lib/ceph/osd/ceph-{osd-num}/ and look
>>>>> at where the symlinks for block and block.wal point to.
>>>>>
>>>>> On Mon, Oct 22, 2018 at 12:29 PM Robert Stanford <
>>>>> rstanford8...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>  That's what they say, however I did exactly this and my cluster
>>>>>> utilization is higher than the total pool utilization by about the number
>>>>>> of OSDs * wal size.  I want to verify that the wal is on the SSDs too but
>>>>>> I've asked here and no one seems to know a way to verify this.  Do you?
>>>>>>
>>>>>>  Thank you, R
>>>>>>
>>>>>> On Mon, Oct 22, 2018 at 5:22 AM Maged Mokhtar 
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> If you specify a db on ssd and data on hdd and not explicitly
>>>>>>> specify a
>>>>>>> device for wal, wal will be placed on same ssd partition with db.
>>>>>>> Placing only wal on ssd or creating separate devices for wal and db
>>>>>>> are
>>>>>>> less common setups.
>>>>>>>
>>>>>>> /Maged
>>>>>>>
>>>>>>> On 22/10/18 09:03, Fyodor Ustinov wrote:
>>>>>>> > Hi!
>>>>>>> >
>>>>>>> > For sharing SSD between WAL and DB what should be placed on SSD?
>>>>>>> WAL or DB?
>>>>>>> >
>>>>>>> > - Original Message -
>>>>>>> > From: "Maged Mokhtar" 
>>>>>>> > To: "ceph-users" 
>>>>>>> > Sent: Saturday, 20 October, 2018 20:05:44
>>>>>>> > Subject: Re: [ceph-users] Drive for Wal and Db
>>>>>>> >
>>>>>>> > On 20/10/18 18:57, Robert Stanford wrote:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Our OSDs are BlueStore and are on regular hard drives. Each OSD
>>>>>>> has a partition on an SSD for its DB. Wal is on the regular hard drives.
>>>>>>> Should I move the wal to share the SSD with the DB?
>>>>>>> >
>>>>>>> > Regards
>>>>>>> > R
>>>>>>> >
>>>>>>> >
>>>>>>> > ___
>>>>>>> > ceph-users mailing list [ mailto:ceph-users@lists.ceph.com |
>>>>>>> ceph-users@lists.ceph.com ] [
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]
>>>>>>> >
>>>>>>> > you should put wal on the faster device, wal and db could share
>>>>>>> the same ssd partition,
>>>>>>> >
>>>>>>> > Maged
>>>>>>> >
>>>>>>> > ___
>>>>>>> > ceph-users mailing list
>>>>>>> > ceph-users@lists.ceph.com
>>>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>> > ___
>>>>>>> > ceph-users mailing list
>>>>>>> > ceph-users@lists.ceph.com
>>>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>> ___
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@lists.ceph.com
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>> ___
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive for Wal and Db

2018-10-22 Thread David Turner
No, it's exactly what I told you it was.  "bluestore_bdev_partition_path"
is the data path.  In all of my scenarios my DB and Data are on the same
partition, hence mine are the same.  Your DB and WAL are on a different
partition from your Data... so your DB partition is different... Whatever
your misunderstanding is about where/why your cluster's usage is
higher/different than you think it is, it has nothing to do with where your
DB and WAL partitions are.

There is a overhead just for having a FS on the disk.  In this case that FS
is bluestore.  You can look at [1] this ML thread from a while ago where I
mentioned a brand new cluster with no data in it and the WAL partitions on
separate disks that it was using about 1.1GB of data per OSD.

[1]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025246.html
On Mon, Oct 22, 2018 at 4:51 PM Robert Stanford 
wrote:

>
>  That's very helpful, thanks.  In your first case above your
> bluefs_db_partition_path and bluestore_bdev_partition path are the same.
> Though I have a different data and db drive, mine are different.  Might
> this explain something?  My root concern is that there is more utilization
> on the cluster than what's in the pools, the excess equal to about wal size
> * number of osds...
>
> On Mon, Oct 22, 2018 at 3:35 PM David Turner 
> wrote:
>
>> My DB doesn't have a specific partition anywhere, but there's still a
>> symlink for it to the data partition.  On my home cluster with all DB, WAL,
>> and Data on the same disk without any partitions specified there is a block
>> symlink but no block.wal symlink.
>>
>> For the cluster with a specific WAL partition, but no DB partition, my
>> OSD paths looks like [1] this.  For my cluster with everything on the same
>> disk, my OSD paths look like [2] this.  Unless you have a specific path for
>> "bluefs_wal_partition_path" then it's going to find itself on the same
>> partition as the db.
>>
>> [1] $ ceph osd metadata 5 | grep path
>> "bluefs_db_partition_path": "/dev/dm-29",
>> "bluefs_wal_partition_path": "/dev/dm-41",
>> "bluestore_bdev_partition_path": "/dev/dm-29",
>>
>> [2] $ ceph osd metadata 5 | grep path
>> "bluefs_db_partition_path": "/dev/dm-5",
>> "bluestore_bdev_partition_path": "/dev/dm-5",
>>
>> On Mon, Oct 22, 2018 at 4:21 PM Robert Stanford 
>> wrote:
>>
>>>
>>>  Let me add, I have no block.wal file (which the docs suggest should be
>>> there).
>>> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
>>>
>>> On Mon, Oct 22, 2018 at 3:13 PM Robert Stanford 
>>> wrote:
>>>
>>>>
>>>>  We're out of sync, I think.  You have your DB on your data disk so
>>>> your block.db symlink points to that disk, right?  There is however no wal
>>>> symlink?  So how would you verify your WAL actually lived on your NVMe?
>>>>
>>>> On Mon, Oct 22, 2018 at 3:07 PM David Turner 
>>>> wrote:
>>>>
>>>>> And by the data disk I mean that I didn't specify a location for the
>>>>> DB partition.
>>>>>
>>>>> On Mon, Oct 22, 2018 at 4:06 PM David Turner 
>>>>> wrote:
>>>>>
>>>>>> Track down where it says they point to?  Does it match what you
>>>>>> expect?  It does for me.  I have my DB on my data disk and my WAL on a
>>>>>> separate NVMe.
>>>>>>
>>>>>> On Mon, Oct 22, 2018 at 3:21 PM Robert Stanford <
>>>>>> rstanford8...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>  David - is it ensured that wal and db both live where the symlink
>>>>>>> block.db points?  I assumed that was a symlink for the db, but 
>>>>>>> necessarily
>>>>>>> for the wal, because it can live in a place different than the db.
>>>>>>>
>>>>>>> On Mon, Oct 22, 2018 at 2:18 PM David Turner 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> You can always just go to /var/lib/ceph/osd/ceph-{osd-num}/ and
>>>>>>>> look at where the symlinks for block and block.wal point to.
>>>>>>>>
>>>>>>>> On Mon, Oct 22, 2018 at 12:29 PM Robert Stanford <
>>>>>>>> rstanford8...@gmail.com> wrote:
>>>>>>>>
>>>>>>&g

Re: [ceph-users] Drive for Wal and Db

2018-10-22 Thread David Turner
My DB doesn't have a specific partition anywhere, but there's still a
symlink for it to the data partition.  On my home cluster with all DB, WAL,
and Data on the same disk without any partitions specified there is a block
symlink but no block.wal symlink.

For the cluster with a specific WAL partition, but no DB partition, my OSD
paths looks like [1] this.  For my cluster with everything on the same
disk, my OSD paths look like [2] this.  Unless you have a specific path for
"bluefs_wal_partition_path" then it's going to find itself on the same
partition as the db.

[1] $ ceph osd metadata 5 | grep path
"bluefs_db_partition_path": "/dev/dm-29",
"bluefs_wal_partition_path": "/dev/dm-41",
"bluestore_bdev_partition_path": "/dev/dm-29",

[2] $ ceph osd metadata 5 | grep path
"bluefs_db_partition_path": "/dev/dm-5",
"bluestore_bdev_partition_path": "/dev/dm-5",

On Mon, Oct 22, 2018 at 4:21 PM Robert Stanford 
wrote:

>
>  Let me add, I have no block.wal file (which the docs suggest should be
> there).
> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
>
> On Mon, Oct 22, 2018 at 3:13 PM Robert Stanford 
> wrote:
>
>>
>>  We're out of sync, I think.  You have your DB on your data disk so your
>> block.db symlink points to that disk, right?  There is however no wal
>> symlink?  So how would you verify your WAL actually lived on your NVMe?
>>
>> On Mon, Oct 22, 2018 at 3:07 PM David Turner 
>> wrote:
>>
>>> And by the data disk I mean that I didn't specify a location for the DB
>>> partition.
>>>
>>> On Mon, Oct 22, 2018 at 4:06 PM David Turner 
>>> wrote:
>>>
>>>> Track down where it says they point to?  Does it match what you
>>>> expect?  It does for me.  I have my DB on my data disk and my WAL on a
>>>> separate NVMe.
>>>>
>>>> On Mon, Oct 22, 2018 at 3:21 PM Robert Stanford <
>>>> rstanford8...@gmail.com> wrote:
>>>>
>>>>>
>>>>>  David - is it ensured that wal and db both live where the symlink
>>>>> block.db points?  I assumed that was a symlink for the db, but necessarily
>>>>> for the wal, because it can live in a place different than the db.
>>>>>
>>>>> On Mon, Oct 22, 2018 at 2:18 PM David Turner 
>>>>> wrote:
>>>>>
>>>>>> You can always just go to /var/lib/ceph/osd/ceph-{osd-num}/ and look
>>>>>> at where the symlinks for block and block.wal point to.
>>>>>>
>>>>>> On Mon, Oct 22, 2018 at 12:29 PM Robert Stanford <
>>>>>> rstanford8...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>  That's what they say, however I did exactly this and my cluster
>>>>>>> utilization is higher than the total pool utilization by about the 
>>>>>>> number
>>>>>>> of OSDs * wal size.  I want to verify that the wal is on the SSDs too 
>>>>>>> but
>>>>>>> I've asked here and no one seems to know a way to verify this.  Do you?
>>>>>>>
>>>>>>>  Thank you, R
>>>>>>>
>>>>>>> On Mon, Oct 22, 2018 at 5:22 AM Maged Mokhtar 
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> If you specify a db on ssd and data on hdd and not explicitly
>>>>>>>> specify a
>>>>>>>> device for wal, wal will be placed on same ssd partition with db.
>>>>>>>> Placing only wal on ssd or creating separate devices for wal and db
>>>>>>>> are
>>>>>>>> less common setups.
>>>>>>>>
>>>>>>>> /Maged
>>>>>>>>
>>>>>>>> On 22/10/18 09:03, Fyodor Ustinov wrote:
>>>>>>>> > Hi!
>>>>>>>> >
>>>>>>>> > For sharing SSD between WAL and DB what should be placed on SSD?
>>>>>>>> WAL or DB?
>>>>>>>> >
>>>>>>>> > - Original Message -
>>>>>>>> > From: "Maged Mokhtar" 
>>>>>>>> > To: "ceph-users" 
>>>>>>>> > Sent: Saturday, 20 October, 2018 20:05:44
>>>>>>>> > Subject: Re: [ce

Re: [ceph-users] Drive for Wal and Db

2018-10-22 Thread David Turner
Track down where it says they point to?  Does it match what you expect?  It
does for me.  I have my DB on my data disk and my WAL on a separate NVMe.

On Mon, Oct 22, 2018 at 3:21 PM Robert Stanford 
wrote:

>
>  David - is it ensured that wal and db both live where the symlink
> block.db points?  I assumed that was a symlink for the db, but necessarily
> for the wal, because it can live in a place different than the db.
>
> On Mon, Oct 22, 2018 at 2:18 PM David Turner 
> wrote:
>
>> You can always just go to /var/lib/ceph/osd/ceph-{osd-num}/ and look at
>> where the symlinks for block and block.wal point to.
>>
>> On Mon, Oct 22, 2018 at 12:29 PM Robert Stanford 
>> wrote:
>>
>>>
>>>  That's what they say, however I did exactly this and my cluster
>>> utilization is higher than the total pool utilization by about the number
>>> of OSDs * wal size.  I want to verify that the wal is on the SSDs too but
>>> I've asked here and no one seems to know a way to verify this.  Do you?
>>>
>>>  Thank you, R
>>>
>>> On Mon, Oct 22, 2018 at 5:22 AM Maged Mokhtar 
>>> wrote:
>>>
>>>>
>>>> If you specify a db on ssd and data on hdd and not explicitly specify a
>>>> device for wal, wal will be placed on same ssd partition with db.
>>>> Placing only wal on ssd or creating separate devices for wal and db are
>>>> less common setups.
>>>>
>>>> /Maged
>>>>
>>>> On 22/10/18 09:03, Fyodor Ustinov wrote:
>>>> > Hi!
>>>> >
>>>> > For sharing SSD between WAL and DB what should be placed on SSD? WAL
>>>> or DB?
>>>> >
>>>> > - Original Message -
>>>> > From: "Maged Mokhtar" 
>>>> > To: "ceph-users" 
>>>> > Sent: Saturday, 20 October, 2018 20:05:44
>>>> > Subject: Re: [ceph-users] Drive for Wal and Db
>>>> >
>>>> > On 20/10/18 18:57, Robert Stanford wrote:
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > Our OSDs are BlueStore and are on regular hard drives. Each OSD has a
>>>> partition on an SSD for its DB. Wal is on the regular hard drives. Should I
>>>> move the wal to share the SSD with the DB?
>>>> >
>>>> > Regards
>>>> > R
>>>> >
>>>> >
>>>> > ___
>>>> > ceph-users mailing list [ mailto:ceph-users@lists.ceph.com |
>>>> ceph-users@lists.ceph.com ] [
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]
>>>> >
>>>> > you should put wal on the faster device, wal and db could share the
>>>> same ssd partition,
>>>> >
>>>> > Maged
>>>> >
>>>> > ___
>>>> > ceph-users mailing list
>>>> > ceph-users@lists.ceph.com
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> > ___
>>>> > ceph-users mailing list
>>>> > ceph-users@lists.ceph.com
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive for Wal and Db

2018-10-22 Thread David Turner
And by the data disk I mean that I didn't specify a location for the DB
partition.

On Mon, Oct 22, 2018 at 4:06 PM David Turner  wrote:

> Track down where it says they point to?  Does it match what you expect?
> It does for me.  I have my DB on my data disk and my WAL on a separate NVMe.
>
> On Mon, Oct 22, 2018 at 3:21 PM Robert Stanford 
> wrote:
>
>>
>>  David - is it ensured that wal and db both live where the symlink
>> block.db points?  I assumed that was a symlink for the db, but necessarily
>> for the wal, because it can live in a place different than the db.
>>
>> On Mon, Oct 22, 2018 at 2:18 PM David Turner 
>> wrote:
>>
>>> You can always just go to /var/lib/ceph/osd/ceph-{osd-num}/ and look at
>>> where the symlinks for block and block.wal point to.
>>>
>>> On Mon, Oct 22, 2018 at 12:29 PM Robert Stanford <
>>> rstanford8...@gmail.com> wrote:
>>>
>>>>
>>>>  That's what they say, however I did exactly this and my cluster
>>>> utilization is higher than the total pool utilization by about the number
>>>> of OSDs * wal size.  I want to verify that the wal is on the SSDs too but
>>>> I've asked here and no one seems to know a way to verify this.  Do you?
>>>>
>>>>  Thank you, R
>>>>
>>>> On Mon, Oct 22, 2018 at 5:22 AM Maged Mokhtar 
>>>> wrote:
>>>>
>>>>>
>>>>> If you specify a db on ssd and data on hdd and not explicitly specify
>>>>> a
>>>>> device for wal, wal will be placed on same ssd partition with db.
>>>>> Placing only wal on ssd or creating separate devices for wal and db
>>>>> are
>>>>> less common setups.
>>>>>
>>>>> /Maged
>>>>>
>>>>> On 22/10/18 09:03, Fyodor Ustinov wrote:
>>>>> > Hi!
>>>>> >
>>>>> > For sharing SSD between WAL and DB what should be placed on SSD? WAL
>>>>> or DB?
>>>>> >
>>>>> > - Original Message -
>>>>> > From: "Maged Mokhtar" 
>>>>> > To: "ceph-users" 
>>>>> > Sent: Saturday, 20 October, 2018 20:05:44
>>>>> > Subject: Re: [ceph-users] Drive for Wal and Db
>>>>> >
>>>>> > On 20/10/18 18:57, Robert Stanford wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > Our OSDs are BlueStore and are on regular hard drives. Each OSD has
>>>>> a partition on an SSD for its DB. Wal is on the regular hard drives. 
>>>>> Should
>>>>> I move the wal to share the SSD with the DB?
>>>>> >
>>>>> > Regards
>>>>> > R
>>>>> >
>>>>> >
>>>>> > ___
>>>>> > ceph-users mailing list [ mailto:ceph-users@lists.ceph.com |
>>>>> ceph-users@lists.ceph.com ] [
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]
>>>>> >
>>>>> > you should put wal on the faster device, wal and db could share the
>>>>> same ssd partition,
>>>>> >
>>>>> > Maged
>>>>> >
>>>>> > ___
>>>>> > ceph-users mailing list
>>>>> > ceph-users@lists.ceph.com
>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> > ___
>>>>> > ceph-users mailing list
>>>>> > ceph-users@lists.ceph.com
>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>> ___
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive for Wal and Db

2018-10-22 Thread David Turner
You can always just go to /var/lib/ceph/osd/ceph-{osd-num}/ and look at
where the symlinks for block and block.wal point to.

On Mon, Oct 22, 2018 at 12:29 PM Robert Stanford 
wrote:

>
>  That's what they say, however I did exactly this and my cluster
> utilization is higher than the total pool utilization by about the number
> of OSDs * wal size.  I want to verify that the wal is on the SSDs too but
> I've asked here and no one seems to know a way to verify this.  Do you?
>
>  Thank you, R
>
> On Mon, Oct 22, 2018 at 5:22 AM Maged Mokhtar 
> wrote:
>
>>
>> If you specify a db on ssd and data on hdd and not explicitly specify a
>> device for wal, wal will be placed on same ssd partition with db.
>> Placing only wal on ssd or creating separate devices for wal and db are
>> less common setups.
>>
>> /Maged
>>
>> On 22/10/18 09:03, Fyodor Ustinov wrote:
>> > Hi!
>> >
>> > For sharing SSD between WAL and DB what should be placed on SSD? WAL or
>> DB?
>> >
>> > - Original Message -
>> > From: "Maged Mokhtar" 
>> > To: "ceph-users" 
>> > Sent: Saturday, 20 October, 2018 20:05:44
>> > Subject: Re: [ceph-users] Drive for Wal and Db
>> >
>> > On 20/10/18 18:57, Robert Stanford wrote:
>> >
>> >
>> >
>> >
>> > Our OSDs are BlueStore and are on regular hard drives. Each OSD has a
>> partition on an SSD for its DB. Wal is on the regular hard drives. Should I
>> move the wal to share the SSD with the DB?
>> >
>> > Regards
>> > R
>> >
>> >
>> > ___
>> > ceph-users mailing list [ mailto:ceph-users@lists.ceph.com |
>> ceph-users@lists.ceph.com ] [
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]
>> >
>> > you should put wal on the faster device, wal and db could share the
>> same ssd partition,
>> >
>> > Maged
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df space usage confusion - balancing needed?

2018-10-22 Thread David Turner
I haven't had crush-compat do anything helpful for balancing my clusters.
upmap has been amazing and balanced my clusters far better than anything
else I've ever seen.  I would go so far as to say that upmap can achieve a
perfect balance.

It seems to evenly distribute the PGs for each pool onto all OSDs that pool
is on.  It does that with a maximum difference of 1PG based on how
divisible the number of PGs are with the number of OSDs you have.  As a
side note, your OSD CRUSH weights should be the default weights for their
size for upmap to be as effective as it can be.

On Sat, Oct 20, 2018 at 3:58 PM Oliver Freyermuth <
freyerm...@physik.uni-bonn.de> wrote:

> Ok, I'll try out the balancer end of the upcoming week then (after we've
> fixed a HW-issue with one of our mons
> and the cooling system).
>
> Until then, any further advice and whether upmap is recommended over
> crush-compat (all clients are Luminous) are welcome ;-).
>
> Cheers,
> Oliver
>
> Am 20.10.18 um 21:26 schrieb Janne Johansson:
> > Ok, can't say "why" then, I'd reweigh them somewhat to even it out,
> > 1.22 -vs- 0.74 in variance is a lot, so either a balancer plugin for
> > the MGRs, a script or just a few manual tweaks might be in order.
> >
> > Den lör 20 okt. 2018 kl 21:02 skrev Oliver Freyermuth
> > :
> >>
> >> All OSDs are of the very same size. One OSD host has slightly more
> disks (33 instead of 31), though.
> >> So also that that can't explain the hefty difference.
> >>
> >> I attach the output of "ceph osd tree" and "ceph osd df".
> >>
> >> The crush rule for the ceph_data pool is:
> >> rule cephfs_data {
> >> id 2
> >> type erasure
> >> min_size 3
> >> max_size 6
> >> step set_chooseleaf_tries 5
> >> step set_choose_tries 100
> >> step take default class hdd
> >> step chooseleaf indep 0 type host
> >> step emit
> >> }
> >> So that only considers the hdd device class. EC is done with k=4 m=2.
> >>
> >> So I don't see any imbalance on the hardware level, but only a somewhat
> uneven distribution of PGs.
> >> Am I missing something, or is this really just a case for the ceph
> balancer plugin?
> >> I'm just a bit astonished this effect is so huge.
> >> Maybe our 4096 PGs for the ceph_data pool are not enough to get an even
> distribution without balancing?
> >> But it yields about 100 PGs per OSD, as you can see...
> >>
> >> --
> >> # ceph osd tree
> >> ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
> >>  -1   826.26428 root default
> >>  -3 0.43700 host mon001
> >>   0   ssd   0.21799 osd.0   up  1.0 1.0
> >>   1   ssd   0.21799 osd.1   up  1.0 1.0
> >>  -5 0.43700 host mon002
> >>   2   ssd   0.21799 osd.2   up  1.0 1.0
> >>   3   ssd   0.21799 osd.3   up  1.0 1.0
> >> -31 1.81898 host mon003
> >> 230   ssd   0.90999 osd.230 up  1.0 1.0
> >> 231   ssd   0.90999 osd.231 up  1.0 1.0
> >> -10   116.64600 host osd001
> >>   4   hdd   3.64499 osd.4   up  1.0 1.0
> >>   5   hdd   3.64499 osd.5   up  1.0 1.0
> >>   6   hdd   3.64499 osd.6   up  1.0 1.0
> >>   7   hdd   3.64499 osd.7   up  1.0 1.0
> >>   8   hdd   3.64499 osd.8   up  1.0 1.0
> >>   9   hdd   3.64499 osd.9   up  1.0 1.0
> >>  10   hdd   3.64499 osd.10  up  1.0 1.0
> >>  11   hdd   3.64499 osd.11  up  1.0 1.0
> >>  12   hdd   3.64499 osd.12  up  1.0 1.0
> >>  13   hdd   3.64499 osd.13  up  1.0 1.0
> >>  14   hdd   3.64499 osd.14  up  1.0 1.0
> >>  15   hdd   3.64499 osd.15  up  1.0 1.0
> >>  16   hdd   3.64499 osd.16  up  1.0 1.0
> >>  17   hdd   3.64499 osd.17  up  1.0 1.0
> >>  18   hdd   3.64499 osd.18  up  1.0 1.0
> >>  19   hdd   3.64499 osd.19  up  1.0 1.0
> >>  20   hdd   3.64499 osd.20  up  1.0 1.0
> >>  21   hdd   3.64499 osd.21  up  1.0 1.0
> >>  22   hdd   3.64499 osd.22  up  1.0 1.0
> >>  23   hdd   3.64499 osd.23  up  1.0 1.0
> >>  24   hdd   3.64499 osd.24  up  1.0 1.0
> >>  25   hdd   3.64499 osd.25  up  1.0 1.0
> >>  26   hdd   3.64499 osd.26  up  1.0 1.0
> >>  27   hdd   3.64499 osd.27  up  1.0 1.0
> >>  28   hdd   3.64499 osd.28  up  1.0 1.0
> >>  29   hdd   3.64499 osd.29  up  1.0 1.0
> >>  30   hdd   3.64499 osd.30  up  1.0 1.0
> >>  31   hdd   3.64499 osd.31  up  1.0 1.0
> >>  32   hdd   

Re: [ceph-users] bluestore compression enabled but no data compressed

2018-10-19 Thread David Turner
1) I don't really know about the documentation.  You can always put
together a PR for an update to the docs.  I only know what I've tested
trying to get compression working.

2) If you have permissive in both places, no compression will happen, if
you have aggressive globally for the OSDs and none for the pools, you won't
have any compression happening.  Any pool you set to permissive or
aggressive will compress.  Vice Versa is the same.  If you have the pools
all set to aggressive, then only OSDs with permissive or aggressive will
compress.  That is useful if you have mixed disks with flash and spinners
or something using primary affinity or something to speed things up.

3) I do not know much about the outputs.

4) The only way to compress previously written data is to rewrite it.
There is no process that will compress existing data.

On Fri, Oct 19, 2018 at 7:21 AM Frank Schilder  wrote:

> Hi David,
>
> sorry for the slow response, we had a hell of a week at work.
>
> OK, so I had compression mode set to aggressive on some pools, but the
> global option was not changed, because I interpreted the documentation as
> "pool settings take precedence". To check your advise, I executed
>
>   ceph tell "osd.*" config set bluestore_compression_mode aggressive
>
> and dumped a new file consisting of null-bytes. Indeed, this time I
> observe compressed objects:
>
> [root@ceph-08 ~]# ceph daemon osd.80 perf dump | grep blue
> "bluefs": {
> "bluestore": {
> "bluestore_allocated": 2967207936,
> "bluestore_stored": 3161981179,
> "bluestore_compressed": 24549408,
> "bluestore_compressed_allocated": 261095424,
> "bluestore_compressed_original": 522190848,
>
> Obvious questions that come to my mind:
>
> 1) I think either the documentation is misleading or the implementation is
> not following documented behaviour. I observe that per pool settings do
> *not* override globals, but the documentation says they will. (From doc:
> "Sets the policy for the inline compression algorithm for underlying
> BlueStore. This setting overrides the global setting of bluestore
> compression mode.") Will this be fixed in the future? Should this be
> reported?
>
> Remark: When I look at "compression_mode" under "
> http://docs.ceph.com/docs/luminous/rados/operations/pools/?highlight=bluestore%20compression#set-pool-values;
> it actually looks like a copy-and-paste error. The doc here talks about
> compression algorithm (see quote above) while the compression mode should
> be explained. Maybe that is worth looking at?
>
> 2) If I set the global to aggressive, do I now have to disable compression
> explicitly on pools where I don't want compression or is the pool default
> still "none"? Right now, I seem to observe that compression is still
> disabled by default.
>
> 3) Do you know what the output means? What is the compression ratio?
> bluestore_compressed/bluestore_compressed_original=0.04 or
> bluestore_compressed_allocated/bluestore_compressed_original=0.5? The
> second ratio does not look too impressive given the file contents.
>
> 4) Is there any way to get uncompressed data compressed as a background
> task like scrub?
>
> If you have the time to look at these questions, this would be great. Most
> importantly right now is that I got it to work.
>
> Thanks for your help,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: ceph-users  on behalf of Frank
> Schilder 
> Sent: 12 October 2018 17:00
> To: David Turner
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] bluestore compression enabled but no data
> compressed
>
> Hi David,
>
> thanks, now I see what you mean. If you are right, that would mean that
> the documentation is wrong. Under "
> http://docs.ceph.com/docs/master/rados/operations/pools/#set-pool-values;
> is stated that "Sets inline compression algorithm to use for underlying
> BlueStore. This setting overrides the global setting of bluestore
> compression algorithm". In other words, the global setting should be
> irrelevant if compression is enabled on a pool.
>
> Well, I will try how setting both to "aggressive" or "force" works out and
> let you know.
>
> Thanks and have a nice weekend,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: David Turner 
> Sent: 12 October 2018 16:50:31
> To: Frank Schilder
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users]

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-18 Thread David Turner
What are you OSD node stats?  CPU, RAM, quantity and size of OSD disks.
You might need to modify some bluestore settings to speed up the time it
takes to peer or perhaps you might just be underpowering the amount of OSD
disks you're trying to do and your servers and OSD daemons are going as
fast as they can.
On Sat, Oct 13, 2018 at 4:08 PM Stefan Priebe - Profihost AG <
s.pri...@profihost.ag> wrote:

> and a 3rd one:
>
> health: HEALTH_WARN
> 1 MDSs report slow metadata IOs
> 1 MDSs report slow requests
>
> 2018-10-13 21:44:08.150722 mds.cloud1-1473 [WRN] 7 slow requests, 1
> included below; oldest blocked for > 199.922552 secs
> 2018-10-13 21:44:08.150725 mds.cloud1-1473 [WRN] slow request 34.829662
> seconds old, received at 2018-10-13 21:43:33.321031:
> client_request(client.216121228:929114 lookup #0x1/.active.lock
> 2018-10-13 21:43:33.321594 caller_uid=0, caller_gid=0{}) currently
> failed to rdlock, waiting
>
> The relevant OSDs are bluestore again running at 100% I/O:
>
> iostat shows:
> sdi  77,00 0,00  580,00   97,00 511032,00   972,00
> 1512,5714,88   22,05   24,576,97   1,48 100,00
>
> so it reads with 500MB/s which completely saturates the osd. And it does
> for > 10 minutes.
>
> Greets,
> Stefan
>
> Am 13.10.2018 um 21:29 schrieb Stefan Priebe - Profihost AG:
> >
> > ods.19 is a bluestore osd on a healthy 2TB SSD.
> >
> > Log of osd.19 is here:
> > https://pastebin.com/raw/6DWwhS0A
> >
> > Am 13.10.2018 um 21:20 schrieb Stefan Priebe - Profihost AG:
> >> Hi David,
> >>
> >> i think this should be the problem - form a new log from today:
> >>
> >> 2018-10-13 20:57:20.367326 mon.a [WRN] Health check update: 4 osds down
> >> (OSD_DOWN)
> >> ...
> >> 2018-10-13 20:57:41.268674 mon.a [WRN] Health check update: Reduced data
> >> availability: 3 pgs peering (PG_AVAILABILITY)
> >> ...
> >> 2018-10-13 20:58:08.684451 mon.a [WRN] Health check failed: 1 osds down
> >> (OSD_DOWN)
> >> ...
> >> 2018-10-13 20:58:22.841210 mon.a [WRN] Health check failed: Reduced data
> >> availability: 8 pgs inactive (PG_AVAILABILITY)
> >> 
> >> 2018-10-13 20:58:47.570017 mon.a [WRN] Health check update: Reduced data
> >> availability: 5 pgs inactive (PG_AVAILABILITY)
> >> ...
> >> 2018-10-13 20:58:49.142108 osd.19 [WRN] Monitor daemon marked osd.19
> >> down, but it is still running
> >> 2018-10-13 20:58:53.750164 mon.a [WRN] Health check update: Reduced data
> >> availability: 3 pgs inactive (PG_AVAILABILITY)
> >> ...
> >>
> >> so there is a timeframe of > 90s whee PGs are inactive and unavail -
> >> this would at least explain stalled I/O to me?
> >>
> >> Greets,
> >> Stefan
> >>
> >>
> >> Am 12.10.2018 um 15:59 schrieb David Turner:
> >>> The PGs per OSD does not change unless the OSDs are marked out.  You
> >>> have noout set, so that doesn't change at all during this test.  All of
> >>> your PGs peered quickly at the beginning and then were
> active+undersized
> >>> the rest of the time, you never had any blocked requests, and you
> always
> >>> had 100MB/s+ client IO.  I didn't see anything wrong with your cluster
> >>> to indicate that your clients had any problems whatsoever accessing
> data.
> >>>
> >>> Can you confirm that you saw the same problems while you were running
> >>> those commands?  The next thing would seem that possibly a client isn't
> >>> getting an updated OSD map to indicate that the host and its OSDs are
> >>> down and it's stuck trying to communicate with host7.  That would
> >>> indicate a potential problem with the client being unable to
> communicate
> >>> with the Mons maybe?  Have you completely ruled out any network
> problems
> >>> between all nodes and all of the IPs in the cluster.  What does your
> >>> client log show during these times?
> >>>
> >>> On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG
> >>> mailto:n.fahldi...@profihost.ag>> wrote:
> >>>
> >>> Hi, in our `ceph.conf` we have:
> >>>
> >>>   mon_max_pg_per_osd = 300
> >>>
> >>> While the host is offline (9 OSDs down):
> >>>
> >>>   4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
> >>>
> >>> If all OSDs are online:
> >>>
> >>>   4352 PGs * 

Re: [ceph-users] ceph pg/pgp number calculation

2018-10-18 Thread David Turner
Not all pools need the same amount of PGs. When you get to so many pools
you want to start calculating how much data each pool will have. If 1 of
your pools will have 80% of your data in it, it should have 80% of your
PGs. The metadata pools for rgw likely won't need more than 8 or so PGs
each. If your rgw data pool is only going to have a little scratch data,
then it won't need very many PGs either.

On Tue, Oct 16, 2018, 3:35 AM Zhenshi Zhou  wrote:

> Hi,
>
> I have a cluster serving rbd and cephfs storage for a period of
> time. I added rgw in the cluster yesterday and wanted it to server
> object storage. Everything seems good.
>
> What I'm confused is how to calculate the pg/pgp number. As we
> all know, the formula of calculating pgs is:
>
> Total PGs = ((Total_number_of_OSD * 100) / max_replication_count) /
> pool_count
>
> Before I created rgw, the cluster had 3 pools(rbd, cephfs_data,
> cephfs_meta).
> But now it has 8 pools, which object service may use, including
> '.rgw.root',
> 'default.rgw.control', 'default.rgw.meta', 'default.rgw.log' and
> 'defualt.rgw.buckets.index'.
>
> Should I calculate pg number again using new pool number as 8, or should I
> continue to use the old pg number?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD for MON/MGR/MDS

2018-10-15 Thread David Turner
Mgr and MDS do not use physical space on a disk. Mons do use the disk and
benefit from SSDs, but they write a lot of stuff all the time. Depending
why the SSDs aren't suitable for OSDs, they might not be suitable for mons
either.

On Mon, Oct 15, 2018, 7:16 AM ST Wong (ITSC)  wrote:

> Hi all,
>
>
>
> We’ve got some servers with some small size SSD but no hard disks other
> than system disks.  While they’re not suitable for OSD, will the SSD be
> useful for running MON/MGR/MDS?
>
>
>
> Thanks a lot.
>
> Regards,
>
> /st wong
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore compression enabled but no data compressed

2018-10-12 Thread David Turner
If you go down just a little farther you'll see the settings that you put
into your ceph.conf under the osd section (although I'd probably do
global).  That's where the OSDs get the settings from.  As a note, once
these are set, future writes will be compressed (if they match the
compression settings which you can see there about minimum ratios, blob
sizes, etc).  To compress current data, you need to re-write it.

On Fri, Oct 12, 2018 at 10:41 AM Frank Schilder  wrote:

> Hi David,
>
> thanks for your quick answer. When I look at both references, I see
> exactly the same commands:
>
> ceph osd pool set {pool-name} {key} {value}
>
> where on one page only keys specific for compression are described. This
> is the command I found and used. However, I can't see any compression
> happening. If you know about something else than "ceph osd pool set" -
> commands, please let me know.
>
> Best regards,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: David Turner 
> Sent: 12 October 2018 15:47:20
> To: Frank Schilder
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] bluestore compression enabled but no data
> compressed
>
> It's all of the settings that you found in your first email when you
> dumped the configurations and such.
> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression
>
> On Fri, Oct 12, 2018 at 7:36 AM Frank Schilder  fr...@dtu.dk>> wrote:
> Hi David,
>
> thanks for your answer. I did enable compression on the pools as described
> in the link you sent below (ceph osd pool set sr-fs-data-test
> compression_mode aggressive, I also tried force to no avail). However, I
> could not find anything on enabling compression per OSD. Could you possibly
> provide a source or sample commands?
>
> Thanks and best regards,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: David Turner mailto:drakonst...@gmail.com>>
> Sent: 09 October 2018 17:42
> To: Frank Schilder
> Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] bluestore compression enabled but no data
> compressed
>
> When I've tested compression before there are 2 places you need to
> configure compression.  On the OSDs in the configuration settings that you
> mentioned, but also on the [1] pools themselves.  If you have the
> compression mode on the pools set to none, then it doesn't matter what the
> OSDs configuration is and vice versa unless you are using the setting of
> force.  If you want to default compress everything, set pools to passive
> and osds to aggressive.  If you want to only compress specific pools, set
> the osds to passive and the specific pools to aggressive.  Good luck.
>
>
> [1]
> http://docs.ceph.com/docs/mimic/rados/operations/pools/#set-pool-values
>
> On Tue, Sep 18, 2018 at 7:11 AM Frank Schilder  fr...@dtu.dk><mailto:fr...@dtu.dk<mailto:fr...@dtu.dk>>> wrote:
> I seem to have a problem getting bluestore compression to do anything. I
> followed the documentation and enabled bluestore compression on various
> pools by executing "ceph osd pool set  compression_mode
> aggressive". Unfortunately, it seems like no data is compressed at all. As
> an example, below is some diagnostic output for a data pool used by a
> cephfs:
>
> [root@ceph-01 ~]# ceph --version
> ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
> (stable)
>
> All defaults are OK:
>
> [root@ceph-01 ~]# ceph --show-config | grep compression
> [...]
> bluestore_compression_algorithm = snappy
> bluestore_compression_max_blob_size = 0
> bluestore_compression_max_blob_size_hdd = 524288
> bluestore_compression_max_blob_size_ssd = 65536
> bluestore_compression_min_blob_size = 0
> bluestore_compression_min_blob_size_hdd = 131072
> bluestore_compression_min_blob_size_ssd = 8192
> bluestore_compression_mode = none
> bluestore_compression_required_ratio = 0.875000
> [...]
>
> Compression is reported as enabled:
>
> [root@ceph-01 ~]# ceph osd pool ls detail
> [...]
> pool 24 'sr-fs-data-test' erasure size 8 min_size 7 crush_rule 10
> object_hash rjenkins pg_num 50 pgp_num 50 last_change 7726 flags
> hashpspool,ec_overwrites stripe_width 24576 compression_algorithm snappy
> compression_mode aggressive application cephfs
> [...]
>
> [root@ceph-01 ~]# ceph osd pool get sr-fs-data-test compression_mode
> compression_mode: aggressive
> [root@ceph-01 ~]# ceph osd pool get sr-fs-data-test compression_algorithm
> compressi

Re: [ceph-users] Anyone tested Samsung 860 DCT SSDs?

2018-10-12 Thread David Turner
What do you want to use these for?  "5 Year or 0.2 DWPD" is the durability
of this drive which is absolutely awful for most every use in Ceph.
Possibly if you're using these for data disks (not DB or WAL) and you plan
to have a more durable media to host the DB+WAL on... this could work.  Or
if you're just doing archival storage... but then you should be using much
cheaper spinners.  Back in the days of Filestore and SSD Journals I had
some disks that had 0.3 DWPD and I had to replace all of the disks in under
a year because they ran out of writes.

On Fri, Oct 12, 2018 at 9:55 AM Kenneth Van Alstyne <
kvanalst...@knightpoint.com> wrote:

> Cephers:
> As the subject suggests, has anyone tested Samsung 860 DCT SSDs?
> They are really inexpensive and we are considering buying some to test.
>
> Thanks,
>
> --
> Kenneth Van Alstyne
> Systems Architect
> Knight Point Systems, LLC
> Service-Disabled Veteran-Owned Business
> 1775 Wiehle Avenue Suite 101 | Reston, VA 20190
> 
> c: 228-547-8045 <(228)%20547-8045> f: 571-266-3106 <(571)%20266-3106>
> www.knightpoint.com
> DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
> GSA Schedule 70 SDVOSB: GS-35F-0646S
> GSA MOBIS Schedule: GS-10F-0404Y
> ISO 2 / ISO 27001 / CMMI Level 3
>
> Notice: This e-mail message, including any attachments, is for the sole
> use of the intended recipient(s) and may contain confidential and
> privileged information. Any unauthorized review, copy, use, disclosure, or
> distribution is STRICTLY prohibited. If you are not the intended recipient,
> please contact the sender by reply e-mail and destroy all copies of the
> original message.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-12 Thread David Turner
The PGs per OSD does not change unless the OSDs are marked out.  You have
noout set, so that doesn't change at all during this test.  All of your PGs
peered quickly at the beginning and then were active+undersized the rest of
the time, you never had any blocked requests, and you always had 100MB/s+
client IO.  I didn't see anything wrong with your cluster to indicate that
your clients had any problems whatsoever accessing data.

Can you confirm that you saw the same problems while you were running those
commands?  The next thing would seem that possibly a client isn't getting
an updated OSD map to indicate that the host and its OSDs are down and it's
stuck trying to communicate with host7.  That would indicate a potential
problem with the client being unable to communicate with the Mons maybe?
Have you completely ruled out any network problems between all nodes and
all of the IPs in the cluster.  What does your client log show during these
times?

On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG <
n.fahldi...@profihost.ag> wrote:

> Hi, in our `ceph.conf` we have:
>
>   mon_max_pg_per_osd = 300
>
> While the host is offline (9 OSDs down):
>
>   4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
>
> If all OSDs are online:
>
>   4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
>
> ... so this doesn't seem to be the issue.
>
> If I understood you right, that's what you've meant. If I got you wrong,
> would you mind to point to one of those threads you mentioned?
>
> Thanks :)
>
> Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
> > Hi,
> >
> >
> > On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
> >> I rebooted a Ceph host and logged `ceph status` & `ceph health detail`
> >> every 5 seconds. During this I encountered 'PG_AVAILABILITY Reduced data
> >> availability: pgs peering'. At the same time some VMs hung as described
> >> before.
> >
> > Just a wild guess... you have 71 OSDs and about 4500 PG with size=3.
> > 13500 PG instance overall, resulting in ~190 PGs per OSD under normal
> > circumstances.
> >
> > If one host is down and the PGs have to re-peer, you might reach the
> > limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.
> >
> > You can try to raise this limit. There are several threads on the
> > mailing list about this.
> >
> > Regards,
> > Burkhard
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore compression enabled but no data compressed

2018-10-12 Thread David Turner
It's all of the settings that you found in your first email when you dumped
the configurations and such.
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression

On Fri, Oct 12, 2018 at 7:36 AM Frank Schilder  wrote:

> Hi David,
>
> thanks for your answer. I did enable compression on the pools as described
> in the link you sent below (ceph osd pool set sr-fs-data-test
> compression_mode aggressive, I also tried force to no avail). However, I
> could not find anything on enabling compression per OSD. Could you possibly
> provide a source or sample commands?
>
> Thanks and best regards,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ____
> From: David Turner 
> Sent: 09 October 2018 17:42
> To: Frank Schilder
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] bluestore compression enabled but no data
> compressed
>
> When I've tested compression before there are 2 places you need to
> configure compression.  On the OSDs in the configuration settings that you
> mentioned, but also on the [1] pools themselves.  If you have the
> compression mode on the pools set to none, then it doesn't matter what the
> OSDs configuration is and vice versa unless you are using the setting of
> force.  If you want to default compress everything, set pools to passive
> and osds to aggressive.  If you want to only compress specific pools, set
> the osds to passive and the specific pools to aggressive.  Good luck.
>
>
> [1]
> http://docs.ceph.com/docs/mimic/rados/operations/pools/#set-pool-values
>
> On Tue, Sep 18, 2018 at 7:11 AM Frank Schilder  fr...@dtu.dk>> wrote:
> I seem to have a problem getting bluestore compression to do anything. I
> followed the documentation and enabled bluestore compression on various
> pools by executing "ceph osd pool set  compression_mode
> aggressive". Unfortunately, it seems like no data is compressed at all. As
> an example, below is some diagnostic output for a data pool used by a
> cephfs:
>
> [root@ceph-01 ~]# ceph --version
> ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
> (stable)
>
> All defaults are OK:
>
> [root@ceph-01 ~]# ceph --show-config | grep compression
> [...]
> bluestore_compression_algorithm = snappy
> bluestore_compression_max_blob_size = 0
> bluestore_compression_max_blob_size_hdd = 524288
> bluestore_compression_max_blob_size_ssd = 65536
> bluestore_compression_min_blob_size = 0
> bluestore_compression_min_blob_size_hdd = 131072
> bluestore_compression_min_blob_size_ssd = 8192
> bluestore_compression_mode = none
> bluestore_compression_required_ratio = 0.875000
> [...]
>
> Compression is reported as enabled:
>
> [root@ceph-01 ~]# ceph osd pool ls detail
> [...]
> pool 24 'sr-fs-data-test' erasure size 8 min_size 7 crush_rule 10
> object_hash rjenkins pg_num 50 pgp_num 50 last_change 7726 flags
> hashpspool,ec_overwrites stripe_width 24576 compression_algorithm snappy
> compression_mode aggressive application cephfs
> [...]
>
> [root@ceph-01 ~]# ceph osd pool get sr-fs-data-test compression_mode
> compression_mode: aggressive
> [root@ceph-01 ~]# ceph osd pool get sr-fs-data-test compression_algorithm
> compression_algorithm: snappy
>
> We dumped a 4Gib file with dd from /dev/zero. Should be easy to compress
> with excellent ratio. Search for a PG:
>
> [root@ceph-01 ~]# ceph pg ls-by-pool sr-fs-data-test
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
>  LOG DISK_LOG STATESTATE_STAMPVERSION  REPORTED UP
>  UP_PRIMARY ACTING   ACTING_PRIMARY
> LAST_SCRUB SCRUB_STAMPLAST_DEEP_SCRUB DEEP_SCRUB_STAMP
> 24.0 15  00 0   0  62914560
> 77   77 active+clean 2018-09-14 01:07:14.593007  7698'77 7735:142
> [53,47,36,30,14,55,57,5] 53 [53,47,36,30,14,55,57,5]
>  537698'77 2018-09-14 01:07:14.592966 0'0 2018-09-11
> 08:06:29.309010
>
> There is about 250MB data on the primary OSD, but noting seems to be
> compressed:
>
> [root@ceph-07 ~]# ceph daemon osd.53 perf dump | grep blue
> [...]
> "bluestore_allocated": 313917440,
> "bluestore_stored": 264362803,
> "bluestore_compressed": 0,
> "bluestore_compressed_allocated": 0,
> "bluestore_compressed_original": 0,
> [...]
>
> Just to make sure, I checked one of the objects' contents:
>
> [root@ceph-01 ~]# rados ls -p sr-fs-data-test
> 104.039c
> [...]
> 1

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-11 Thread David Turner
~1,d83ca~1,d83cc~1,d83ce~1,d83d0~1,d83d2~6,d83d9~3,d83df~1,d83e1~2,d83e5~1,d83e8~1,d83eb~4,d83f0~1,d83f2~1,d83f4~3,d83f8~3,d83fd~2,d8402~1,d8405~1,d8407~1,d840a~2,d840f~1,d8411~1,d8413~3,d8417~3,d841c~4,d8422~4,d8428~2,d842b~1,d842e~1,d8430~1,d8432~5,d843a~1,d843c~3,d8440~5,d8447~1,d844a~1,d844d~1,d844f~1,d8452~1,d8455~1,d8457~1,d8459~2,d845d~2,d8460~1,d8462~3,d8467~1,d8469~1,d846b~2,d846e~2,d8471~4,d8476~6,d847d~3,d8482~1,d8484~1,d8486~2,d8489~2,d848c~1,d848e~1,d8491~4,d8499~1,d849c~3,d84a0~1,d84a2~1,d84a4~3,d84aa~2,d84ad~2,d84b1~4,d84b6~1,d84b8~1,d84ba~1,d84bc~1,d84be~1,d84c0~5,d84c7~4,d84ce~1,d84d0~1,d84d2~2,d84d6~2,d84db~1,d84dd~2,d84e2~2,d84e6~1,d84e9~1,d84eb~4,d84f0~4]
> pool 6 'cephfs_cephstor1_data' replicated size 3 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 1214952 flags
> hashpspool stripe_width 0 application cephfs
> pool 7 'cephfs_cephstor1_metadata' replicated size 3 min_size 1
> crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change
> 1214952 flags hashpspool stripe_width 0 application cephfs
>
> Am 11.10.2018 um 20:47 schrieb David Turner:
> > My first guess is to ask what your crush rules are.  `ceph osd crush
> > rule dump` along with `ceph osd pool ls detail` would be helpful.  Also
> > if you have a `ceph status` output from a time where the VM RBDs aren't
> > working might explain something.
> >
> > On Thu, Oct 11, 2018 at 1:12 PM Nils Fahldieck - Profihost AG
> > mailto:n.fahldi...@profihost.ag>> wrote:
> >
> > Hi everyone,
> >
> > since some time we experience service outages in our Ceph cluster
> > whenever there is any change to the HEALTH status. E. g. swapping
> > storage devices, adding storage devices, rebooting Ceph hosts, during
> > backfills ect.
> >
> > Just now I had a recent situation, where several VMs hung after I
> > rebooted one Ceph host. We have 3 replications for each PG, 3 mon, 3
> > mgr, 3 mds and 71 osds spread over 9 hosts.
> >
> > We use Ceph as a storage backend for our Proxmox VE (PVE)
> environment.
> > The outages are in the form of blocked virtual file systems of those
> > virtual machines running in our PVE cluster.
> >
> > It feels similar to stuck and inactive PGs to me. Honestly though I'm
> > not really sure on how to debug this problem or which log files to
> > examine.
> >
> > OS: Debian 9
> > Kernel: 4.12 based upon SLE15-SP1
> >
> > # ceph version
> > ceph version 12.2.8-133-gded2f6836f
> > (ded2f6836f6331a58f5c817fca7bfcd6c58795aa) luminous (stable)
> >
> > Can someone guide me? I'm more than happy to provide more information
> > as needed.
> >
> > Thanks in advance
> > Nils
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-11 Thread David Turner
My first guess is to ask what your crush rules are.  `ceph osd crush rule
dump` along with `ceph osd pool ls detail` would be helpful.  Also if you
have a `ceph status` output from a time where the VM RBDs aren't working
might explain something.

On Thu, Oct 11, 2018 at 1:12 PM Nils Fahldieck - Profihost AG <
n.fahldi...@profihost.ag> wrote:

> Hi everyone,
>
> since some time we experience service outages in our Ceph cluster
> whenever there is any change to the HEALTH status. E. g. swapping
> storage devices, adding storage devices, rebooting Ceph hosts, during
> backfills ect.
>
> Just now I had a recent situation, where several VMs hung after I
> rebooted one Ceph host. We have 3 replications for each PG, 3 mon, 3
> mgr, 3 mds and 71 osds spread over 9 hosts.
>
> We use Ceph as a storage backend for our Proxmox VE (PVE) environment.
> The outages are in the form of blocked virtual file systems of those
> virtual machines running in our PVE cluster.
>
> It feels similar to stuck and inactive PGs to me. Honestly though I'm
> not really sure on how to debug this problem or which log files to examine.
>
> OS: Debian 9
> Kernel: 4.12 based upon SLE15-SP1
>
> # ceph version
> ceph version 12.2.8-133-gded2f6836f
> (ded2f6836f6331a58f5c817fca7bfcd6c58795aa) luminous (stable)
>
> Can someone guide me? I'm more than happy to provide more information
> as needed.
>
> Thanks in advance
> Nils
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PG, repair doesn't work

2018-10-11 Thread David Turner
As a part of a repair is queuing a deep scrub. As soon as the repair part
is over the deep scrub continues until it is done.

On Thu, Oct 11, 2018, 12:26 PM Brett Chancellor 
wrote:

> Does the "repair" function use the same rules as a deep scrub? I couldn't
> get one to kick off, until I temporarily increased the max_scrubs and
> lowered the scrub_min_interval on all 3 OSDs for that placement group. This
> ended up fixing the issue, so I'll leave this here in case somebody else
> runs into it.
>
> sudo ceph tell 'osd.208' injectargs '--osd_max_scrubs 3'
> sudo ceph tell 'osd.120' injectargs '--osd_max_scrubs 3'
> sudo ceph tell 'osd.235' injectargs '--osd_max_scrubs 3'
> sudo ceph tell 'osd.208' injectargs '--osd_scrub_min_interval 1.0'
> sudo ceph tell 'osd.120' injectargs '--osd_scrub_min_interval 1.0'
> sudo ceph tell 'osd.235' injectargs '--osd_scrub_min_interval 1.0'
> sudo ceph pg repair 75.302
>
> -Brett
>
>
> On Thu, Oct 11, 2018 at 8:42 AM Maks Kowalik 
> wrote:
>
>> Imho moving was not the best idea (a copying attempt would have told if
>> the read error was the case here).
>> Scrubs might don't want to start if there are many other scrubs ongoing.
>>
>> czw., 11 paź 2018 o 14:27 Brett Chancellor 
>> napisał(a):
>>
>>> I moved the file. But the cluster won't actually start any scrub/repair
>>> I manually initiate.
>>>
>>> On Thu, Oct 11, 2018, 7:51 AM Maks Kowalik 
>>> wrote:
>>>
 Based on the log output it looks like you're having a damaged file on
 OSD 235 where the shard is stored.
 To ensure if that's the case you should find the file (using
 81d5654895863d as a part of its name) and try to copy it to another
 directory.
 If you get the I/O error while copying, the next steps would be to
 delete the file, run the scrub on 75.302 and take a deep look at the
 OSD.235 for any other errors.

 Kind regards,
 Maks

>>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] https://ceph-storage.slack.com

2018-10-11 Thread David Turner
I have 4 other slack servers that I'm in for work and personal hobbies.
It's just easier for me to maintain one more slack server than have a
separate application for IRC.

On Thu, Oct 11, 2018, 11:02 AM John Spray  wrote:

> On Thu, Oct 11, 2018 at 8:44 AM Marc Roos 
> wrote:
> >
> >
> > Why slack anyway?
>
> Just because some people like using it.  Don't worry, IRC is still the
> primary channel and lots of people don't use slack.  I'm not on slack,
> for example, which is either a good or bad thing depending on your
> perspective :-D
>
> John
>
> >
> >
> >
> >
> > -Original Message-
> > From: Konstantin Shalygin [mailto:k0...@k0ste.ru]
> > Sent: donderdag 11 oktober 2018 5:11
> > To: ceph-users@lists.ceph.com
> > Subject: *SPAM* Re: [ceph-users] https://ceph-storage.slack.com
> >
> > > why would a ceph slack be invite only?
> >
> > Because this is not Telegram.
> >
> >
> >
> > k
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-10 Thread David Turner
Not a resolution, but an idea that you've probably thought of.  Disabling
logging on any affected OSDs (possibly just all of them) seems like a
needed step to be able to keep working with this cluster to finish the
upgrade and get it healthier.

On Wed, Oct 10, 2018 at 6:37 PM Wido den Hollander  wrote:

>
>
> On 10/11/2018 12:08 AM, Wido den Hollander wrote:
> > Hi,
> >
> > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm
> > seeing OSDs writing heavily to their logfiles spitting out these lines:
> >
> >
> > 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd2078000~34000
> > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd22cc000~24000
> > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd230~2
> > 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd2324000~24000
> > 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd26c~24000
> > 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd2704000~3
> >
> > It goes so fast that the OS-disk in this case can't keep up and become
> > 100% util.
> >
> > This causes the OSD to slow down and cause slow requests and starts to
> flap.
> >
> > It seems that this is *only* happening on OSDs which are the fullest
> > (~85%) on this cluster and they have about ~400 PGs each (Yes, I know,
> > that's high).
> >
>
> After some searching I stumbled upon this Bugzilla report:
> https://bugzilla.redhat.com/show_bug.cgi?id=1600138
>
> That seems to be the same issue, although I'm not 100% sure.
>
> Wido
>
> > Looking at StupidAllocator.cc I see this piece of code:
> >
> > void StupidAllocator::dump()
> > {
> >   std::lock_guard l(lock);
> >   for (unsigned bin = 0; bin < free.size(); ++bin) {
> > ldout(cct, 0) << __func__ << " free bin " << bin << ": "
> >   << free[bin].num_intervals() << " extents" << dendl;
> > for (auto p = free[bin].begin();
> >  p != free[bin].end();
> >  ++p) {
> >   ldout(cct, 0) << __func__ << "  0x" << std::hex << p.get_start()
> > << "~"
> > << p.get_len() << std::dec << dendl;
> > }
> >   }
> > }
> >
> > I'm just wondering why it would spit out these lines and what's causing
> it.
> >
> > Has anybody seen this before?
> >
> > Wido
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] https://ceph-storage.slack.com

2018-10-10 Thread David Turner
I would like an invite to.  drakonst...@gmail.com

On Wed, Sep 19, 2018 at 1:02 PM Gregory Farnum  wrote:

> Done. :)
>
> On Tue, Sep 18, 2018 at 12:15 PM Alfredo Daniel Rezinovsky <
> alfredo.rezinov...@ingenieria.uncuyo.edu.ar> wrote:
>
>> Can anyone add me to this slack?
>>
>> with my email alfrenov...@gmail.com
>>
>> Thanks.
>>
>> --
>> Alfredo Daniel Rezinovsky
>> Director de Tecnologías de Información y Comunicaciones
>> Facultad de Ingeniería - Universidad Nacional de Cuyo
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN 2 osd(s) have {NOUP, NODOWN, NOIN, NOOUT} flags set

2018-10-10 Thread David Turner
There is a newer [1] feature to be able to set flags per OSD instead of
cluster wide.  This way you can prevent a problem host from marking its
OSDs down while the rest ofthe cluster is capable of doing so.  [2] These
commands ought to clear up your status.

[1]
http://docs.ceph.com/docs/master/rados/operations/health-checks/#osd-flags

[2] ceph osd rm-noin 3
ceph osd rm-noin 5
ceph osd rm-noin 10

On Tue, Oct 9, 2018 at 1:49 PM Rafael Montes  wrote:

> Hello everyone,
>
>
> I am getting warning messages regarding 3osd's  with noin and noout flags
> set. The osd are in up state.I have run the ceph osd unset noin  on the
> cluster and it does not seem to clear the flags. I have attached status
> files for the cluster.
>
>
> The cluster is running  deepsea-0.8.6-2.21.1.noarch and
> ceph-12.2.8+git.1536505967.080f2248ff-2.15.1.x86_64.
>
>
>
> Has anybody run into this issue and if so how was it resolved?
>
>
> Thanks
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Does anyone use interactive CLI mode?

2018-10-10 Thread David Turner
I know that it existed, but I've never bothered using it.  In applications
like Python where you can get a different reaction by interacting with it
line by line and setting up an environment it is very helpful.  Ceph,
however, doesn't have any such environment variables that would make this
more useful than the traditional CLI.

On Wed, Oct 10, 2018 at 10:20 AM John Spray  wrote:

> Hi all,
>
> Since time immemorial, the Ceph CLI has had a mode where when run with
> no arguments, you just get an interactive prompt that lets you run
> commands without "ceph" at the start.
>
> I recently discovered that we actually broke this in Mimic[1], and it
> seems that nobody noticed!
>
> So the question is: does anyone actually use this feature?  It's not
> particularly expensive to maintain, but it might be nice to have one
> less path through the code if this is entirely unused.
>
> Cheers,
> John
>
> 1. https://github.com/ceph/ceph/pull/24521
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't remove DeleteMarkers in rgw bucket

2018-10-09 Thread David Turner
I would suggest trying to delete the bucket using radosgw-admin.  If you
can't get that to work, then I would go towards deleting the actual RADOS
objects.  There are a few threads on the ML that talk about manually
deleting a bucket.

On Thu, Sep 20, 2018 at 2:04 PM Sean Purdy  wrote:

> Hi,
>
>
> We have a bucket that we are trying to empty.  Versioning and lifecycle
> was enabled.  We deleted all the objects in the bucket.  But this left a
> whole bunch of Delete Markers.
>
> aws s3api delete-object --bucket B --key K --version-id V is not deleting
> the delete markers.
>
> Any ideas?  We want to delete the bucket so we can reuse the bucket name.
> Alternatively, is there a way to delete a bucket that still contains delete
> markers?
>
>
> $ aws --profile=owner s3api list-object-versions --bucket bucket --prefix
> 0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7
>
> {
>   "DeleteMarkers": [
> {
>   "Owner": {
> "DisplayName": "bucket owner",
> "ID": "owner"
>   },
>   "IsLatest": true,
>   "VersionId": "ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd",
>   "Key": "0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7",
>   "LastModified": "2018-09-17T16:19:58.187Z"
> }
>   ]
> }
>
> $ aws --profile=owner s3api delete-object --bucket bucket --key
> 0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7 --version-id
> ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd
>
> returns 0 but the delete marker remains.
>
>
> This bucket was created in 12.2.2, current version of ceph is 12.2.7 via
> 12.2.5
>
>
> Thanks,
>
> Sean
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore compression enabled but no data compressed

2018-10-09 Thread David Turner
When I've tested compression before there are 2 places you need to
configure compression.  On the OSDs in the configuration settings that you
mentioned, but also on the [1] pools themselves.  If you have the
compression mode on the pools set to none, then it doesn't matter what the
OSDs configuration is and vice versa unless you are using the setting of
force.  If you want to default compress everything, set pools to passive
and osds to aggressive.  If you want to only compress specific pools, set
the osds to passive and the specific pools to aggressive.  Good luck.


[1] http://docs.ceph.com/docs/mimic/rados/operations/pools/#set-pool-values

On Tue, Sep 18, 2018 at 7:11 AM Frank Schilder  wrote:

> I seem to have a problem getting bluestore compression to do anything. I
> followed the documentation and enabled bluestore compression on various
> pools by executing "ceph osd pool set  compression_mode
> aggressive". Unfortunately, it seems like no data is compressed at all. As
> an example, below is some diagnostic output for a data pool used by a
> cephfs:
>
> [root@ceph-01 ~]# ceph --version
> ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
> (stable)
>
> All defaults are OK:
>
> [root@ceph-01 ~]# ceph --show-config | grep compression
> [...]
> bluestore_compression_algorithm = snappy
> bluestore_compression_max_blob_size = 0
> bluestore_compression_max_blob_size_hdd = 524288
> bluestore_compression_max_blob_size_ssd = 65536
> bluestore_compression_min_blob_size = 0
> bluestore_compression_min_blob_size_hdd = 131072
> bluestore_compression_min_blob_size_ssd = 8192
> bluestore_compression_mode = none
> bluestore_compression_required_ratio = 0.875000
> [...]
>
> Compression is reported as enabled:
>
> [root@ceph-01 ~]# ceph osd pool ls detail
> [...]
> pool 24 'sr-fs-data-test' erasure size 8 min_size 7 crush_rule 10
> object_hash rjenkins pg_num 50 pgp_num 50 last_change 7726 flags
> hashpspool,ec_overwrites stripe_width 24576 compression_algorithm snappy
> compression_mode aggressive application cephfs
> [...]
>
> [root@ceph-01 ~]# ceph osd pool get sr-fs-data-test compression_mode
> compression_mode: aggressive
> [root@ceph-01 ~]# ceph osd pool get sr-fs-data-test compression_algorithm
> compression_algorithm: snappy
>
> We dumped a 4Gib file with dd from /dev/zero. Should be easy to compress
> with excellent ratio. Search for a PG:
>
> [root@ceph-01 ~]# ceph pg ls-by-pool sr-fs-data-test
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
>  LOG DISK_LOG STATESTATE_STAMPVERSION  REPORTED UP
>  UP_PRIMARY ACTING   ACTING_PRIMARY
> LAST_SCRUB SCRUB_STAMPLAST_DEEP_SCRUB DEEP_SCRUB_STAMP
>
> 24.0 15  00 0   0  62914560
> 77   77 active+clean 2018-09-14 01:07:14.593007  7698'77 7735:142
> [53,47,36,30,14,55,57,5] 53 [53,47,36,30,14,55,57,5]
>  537698'77 2018-09-14 01:07:14.592966 0'0 2018-09-11
> 08:06:29.309010
>
> There is about 250MB data on the primary OSD, but noting seems to be
> compressed:
>
> [root@ceph-07 ~]# ceph daemon osd.53 perf dump | grep blue
> [...]
> "bluestore_allocated": 313917440,
> "bluestore_stored": 264362803,
> "bluestore_compressed": 0,
> "bluestore_compressed_allocated": 0,
> "bluestore_compressed_original": 0,
> [...]
>
> Just to make sure, I checked one of the objects' contents:
>
> [root@ceph-01 ~]# rados ls -p sr-fs-data-test
> 104.039c
> [...]
> 104.039f
>
> It is 4M chunks ...
> [root@ceph-01 ~]# rados -p sr-fs-data-test stat 104.039f
> sr-fs-data-test/104.039f mtime 2018-09-11 14:39:38.00,
> size 4194304
>
> ... with all zeros:
>
> [root@ceph-01 ~]# rados -p sr-fs-data-test get 104.039f obj
>
> [root@ceph-01 ~]# hexdump -C obj
>   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> ||
> *
> 0040
>
> All as it should be, except for compression. Am I overlooking something?
>
> Best regards,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error-code 2002/API 405 S3 REST API. Creating a new bucket

2018-10-09 Thread David Turner
Can you outline the process you're using to access the REST API?  It's hard
to troubleshoot this without knowing how you were trying to do this.

On Mon, Sep 17, 2018 at 7:09 PM Michael Schäfer 
wrote:

> Hi,
>
> We have a problem with the radosgw using the S3 REST API.
> Trying to create a new bucket does not work.
> We got an 405 on API level and the  log does indicate an 2002 error.
> Do anybody know, what this error-code does mean? Find the radosgw-log
> attached
>
> Bests,
> Michael
>
> 2018-09-17 11:58:03.388 7f65250c2700  1 == starting new request
> req=0x7f65250b9830 =
> 2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.20::GET
> /egobackup::initializing for trans_id =
> tx00014-005b9f88bb-d393-default
> 2018-09-17 11:58:03.388 7f65250c2700 10 rgw api priority: s3=5 s3website=4
> 2018-09-17 11:58:03.388 7f65250c2700 10 host=85.214.24.54
> 2018-09-17 11:58:03.388 7f65250c2700 20 subdomain= domain=
> in_hosted_domain=0 in_hosted_domain_s3website=0
> 2018-09-17 11:58:03.388 7f65250c2700 20 final domain/bucket subdomain=
> domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain=
> s->info.request_
> uri=/egobackup
> 2018-09-17 11:58:03.388 7f65250c2700 20 get_handler
> handler=25RGWHandler_REST_Bucket_S3
> 2018-09-17 11:58:03.388 7f65250c2700 10 handler=25RGWHandler_REST_Bucket_S3
> 2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.81:s3:GET
> /egobackup::getting op 0
> 2018-09-17 11:58:03.388 7f65250c2700 10
> op=32RGWGetBucketLocation_ObjStore_S3
> 2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.86:s3:GET
> /egobackup:get_bucket_location:verifying requester
> 2018-09-17 11:58:03.388 7f65250c2700 20
> rgw::auth::StrategyRegistry::s3_main_strategy_t: trying
> rgw::auth::s3::AWSAuthStrategy
> 2018-09-17 11:58:03.388 7f65250c2700 20 rgw::auth::s3::AWSAuthStrategy:
> trying rgw::auth::s3::S3AnonymousEngine
> 2018-09-17 11:58:03.388 7f65250c2700 20 rgw::auth::s3::S3AnonymousEngine
> denied with reason=-1
> 2018-09-17 11:58:03.388 7f65250c2700 20 rgw::auth::s3::AWSAuthStrategy:
> trying rgw::auth::s3::LocalEngine
> 2018-09-17 11:58:03.388 7f65250c2700 10 get_canon_resource():
> dest=/egobackup?location
> 2018-09-17 11:58:03.388 7f65250c2700 10 string_to_sign:
> GET
> 1B2M2Y8AsgTpgAmY7PhCfg==
>
> Mon, 17 Sep 2018 10:58:03 GMT
> /egobackup?location
> 2018-09-17 11:58:03.388 7f65250c2700 15 string_to_sign=GET
> 1B2M2Y8AsgTpgAmY7PhCfg==
>
> Mon, 17 Sep 2018 10:58:03 GMT
> /egobackup?location
> 2018-09-17 11:58:03.388 7f65250c2700 15 server
> signature=fbEd2DlKyKC8JOXTgMZSXV68ngc=
> 2018-09-17 11:58:03.388 7f65250c2700 15 client
> signature=fbEd2DlKyKC8JOXTgMZSXV68ngc=
> 2018-09-17 11:58:03.388 7f65250c2700 15 compare=0
> 2018-09-17 11:58:03.388 7f65250c2700 20 rgw::auth::s3::LocalEngine granted
> access
> 2018-09-17 11:58:03.388 7f65250c2700 20 rgw::auth::s3::AWSAuthStrategy
> granted access
> 2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.000226:s3:GET
> /egobackup:get_bucket_location:normalizing buckets and tenants
> 2018-09-17 11:58:03.388 7f65250c2700 10 s->object=
> s->bucket=egobackup
> 2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.000235:s3:GET
> /egobackup:get_bucket_location:init permissions
> 2018-09-17 11:58:03.388 7f65250c2700 20 get_system_obj_state:
> rctx=0x7f65250b7a30 obj=default.rgw.meta:root:egobackup
> state=0x55b1bc2e1220 s->prefetch_data=0
> 2018-09-17 11:58:03.388 7f65250c2700 10 cache get:
> name=default.rgw.meta+root+egobackup : miss
> 2018-09-17 11:58:03.388 7f65250c2700 10 cache put:
> name=default.rgw.meta+root+egobackup info.flags=0x0
> 2018-09-17 11:58:03.388 7f65250c2700 10 adding
> default.rgw.meta+root+egobackup to cache LRU end
> 2018-09-17 11:58:03.388 7f65250c2700 10 init_permissions on egobackup[]
> failed, ret=-2002
> 2018-09-17 11:58:03.388 7f65250c2700 20 op->ERRORHANDLER: err_no=-2002
> new_err_no=-2002
> 2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_status:
> e=0, sent=24, total=0
> 2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_header:
> e=0, sent=0, total=0
> 2018-09-17 11:58:03.388 7f65250c2700 30
> AccountingFilter::send_content_length: e=0, sent=21, total=0
> 2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_header:
> e=0, sent=0, total=0
> 2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_header:
> e=0, sent=0, total=0
> 2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::complete_header:
> e=0, sent=159, total=0
> 2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::set_account: e=1
> 2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_body: e=1,
> sent=219, total=0
> 2018-09-17 11:58:03.388 7f65250c2700 30
> AccountingFilter::complete_request: e=1, sent=0, total=219
> 2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.001272:s3:GET
> /egobackup:get_bucket_location:op status=0
> 2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.001276:s3:GET
> /egobackup:get_bucket_location:http status=404
> 2018-09-17 11:58:03.388 7f65250c2700  1 

Re: [ceph-users] radosgw bucket stats vs s3cmd du

2018-10-09 Thread David Turner
Have you looked at your Garbage Collection.  I would guess that your GC is
behind and that radosgw-admin is accounting for that space knowing that it
hasn't been freed up yet, whiles 3cmd doesn't see it since it no longer
shows in the listing.

On Tue, Sep 18, 2018 at 4:45 AM Luis Periquito  wrote:

> Hi all,
>
> I have a couple of very big s3 buckets that store temporary data. We
> keep writing to the buckets some files which are then read and
> deleted. They serve as a temporary storage.
>
> We're writing (and deleting) circa 1TB of data daily in each of those
> buckets, and their size has been mostly stable over time.
>
> The issue has arisen that radosgw-admin bucket stats says one bucket
> is 10T and the other is 4T; but s3cmd du (and I did a sync which
> agrees) says 3.5T and 2.3T respectively.
>
> The bigger bucket suffered from the orphaned objects bug
> (http://tracker.ceph.com/issues/18331). The smaller was created as
> 10.2.3 so it may also had the suffered from the same bug.
>
> Any ideas what could be at play here? How can we reduce actual usage?
>
> trimming part of the radosgw-admin bucket stats output
> "usage": {
> "rgw.none": {
> "size": 0,
> "size_actual": 0,
> "size_utilized": 0,
> "size_kb": 0,
> "size_kb_actual": 0,
> "size_kb_utilized": 0,
> "num_objects": 18446744073709551572
> },
> "rgw.main": {
> "size": 10870197197183,
> "size_actual": 10873866362880,
> "size_utilized": 18446743601253967400,
> "size_kb": 10615426951,
> "size_kb_actual": 10619010120,
> "size_kb_utilized": 18014398048099578,
> "num_objects": 1702444
> },
> "rgw.multimeta": {
> "size": 0,
> "size_actual": 0,
> "size_utilized": 0,
> "size_kb": 0,
> "size_kb_actual": 0,
> "size_kb_utilized": 0,
> "num_objects": 406462
> }
> },
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] list admin issues

2018-10-06 Thread David C
Same issue here, Gmail user, member of different lists but only get
disabled on ceph-users. Happens about once a month but had three in Sept.

On Sat, 6 Oct 2018, 18:28 Janne Johansson,  wrote:

> Den lör 6 okt. 2018 kl 15:06 skrev Elias Abacioglu
> :
> >
> > Hi,
> >
> > I'm bumping this old thread cause it's getting annoying. My membership
> get disabled twice a month.
> > Between my two Gmail accounts I'm in more than 25 mailing lists and I
> see this behavior only here. Why is only ceph-users only affected? Maybe
> Christian was on to something, is this intentional?
> > Reality is that there is a lot of ceph-users with Gmail accounts,
> perhaps it wouldn't be so bad to actually trying to figure this one out?
> >
> > So can the maintainers of this list please investigate what actually
> gets bounced? Look at my address if you want.
> > I got disabled 20181006, 20180927, 20180916, 20180725, 20180718 most
> recently.
> > Please help!
>
> Same here.
>
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-10-01 Thread David Turner
I tried modifying filestore_rocksdb_options by removing
compression=kNoCompression
as well as setting it to compression=kSnappyCompression.  Leaving it with
kNoCompression or removing it results in the same segfault in the previous
log.  Setting it to kSnappyCompression resulted in [1] this being logged
and the OSD just failing to start instead of segfaulting.  Is there
anything else you would suggest trying before I purge this OSD from the
cluster?  I'm afraid it might be something with the CentOS binaries.

[1] 2018-10-01 17:10:37.134930 7f1415dfcd80  0  set rocksdb option
compression = kSnappyCompression
2018-10-01 17:10:37.134986 7f1415dfcd80 -1 rocksdb: Invalid argument:
Compression type Snappy is not linked with the binary.
2018-10-01 17:10:37.135004 7f1415dfcd80 -1
filestore(/var/lib/ceph/osd/ceph-1) mount(1723): Error initializing rocksdb
:
2018-10-01 17:10:37.135020 7f1415dfcd80 -1 osd.1 0 OSD:init: unable to
mount object store
2018-10-01 17:10:37.135029 7f1415dfcd80 -1 ESC[0;31m ** ERROR: osd init
failed: (1) Operation not permittedESC[0m

On Sat, Sep 29, 2018 at 1:57 PM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> I looked at one of my test clusters running Jewel on Ubuntu 16.04, and
> interestingly I found this(below) in one of the OSD logs, which is
> different from your OSD boot log, where none of the compression algorithms
> seem to be supported. This hints more at how rocksdb was built on CentOS
> for Ceph.
>
> 2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Compression algorithms
> supported:
> 2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Snappy supported: 1
> 2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb: Zlib supported: 1
> 2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb: Bzip supported: 0
> 2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb: LZ4 supported: 0
> 2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb: ZSTD supported: 0
> 2018-09-29 17:38:38.629115 7fbd318d4b00  4 rocksdb: Fast CRC32 supported: 0
>
> On 9/27/18, 2:56 PM, "Pavan Rallabhandi" 
> wrote:
>
> I see Filestore symbols on the stack, so the bluestore config doesn’t
> affect. And the top frame of the stack hints at a RocksDB issue, and there
> are a whole lot of these too:
>
> “2018-09-17 19:23:06.480258 7f1f3d2a7700  2 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/table/block_based_table_reader.cc:636]
> Cannot find Properties block from file.”
>
> It really seems to be something with RocksDB on centOS. I still think
> you can try removing “compression=kNoCompression” from the
> filestore_rocksdb_options And/Or check if rocksdb is expecting snappy to be
> enabled.
>
> Thanks,
> -Pavan.
>
> From: David Turner 
> Date: Thursday, September 27, 2018 at 1:18 PM
> To: Pavan Rallabhandi 
> Cc: ceph-users 
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> I got pulled away from this for a while.  The error in the log is
> "abort: Corruption: Snappy not supported or corrupted Snappy compressed
> block contents" and the OSD has 2 settings set to snappy by default,
> async_compressor_type and bluestore_compression_algorithm.  Do either of
> these settings affect the omap store?
>
> On Wed, Sep 19, 2018 at 2:33 PM Pavan Rallabhandi  prallabha...@walmartlabs.com> wrote:
> Looks like you are running on CentOS, fwiw. We’ve successfully ran the
> conversion commands on Jewel, Ubuntu 16.04.
>
> Have a feel it’s expecting the compression to be enabled, can you try
> removing “compression=kNoCompression” from the filestore_rocksdb_options?
> And/or you might want to check if rocksdb is expecting snappy to be enabled.
>
> From: David Turner <mailto:drakonst...@gmail.com>
> Date: Tuesday, September 18, 2018 at 6:01 PM
> To: Pavan Rallabhandi <mailto:prallabha...@walmartlabs.com>
> Cc: ceph-users <mailto:ceph-users@lists.ceph.com>
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> Here's the [1] full log from the time the OSD was started to the end
> of the crash dump.  These logs are so hard to parse.  Is there anything
> useful in them?
>
> I did confirm that all perms were set correctly and that the
> superblock was changed to rocksdb before the first time I attempted to
> start the OSD with it's new DB.  This is on a fully Luminous cluster with
> [2] the defaults you mentioned.
>
> [1]
> https://gist.github.com/drakonstein/fa3ac0ad9b2ec1389c957f95e05b7

Re: [ceph-users] mount cephfs from a public network ip of mds

2018-09-30 Thread David Turner
I doubt you have a use case that requires you to have a different public
and private network. Just use 1 network on the 10Gb nics. There have been
plenty of mailing list threads in the last year along with testing and
production experience that indicate that having the networks separated is
not needed for a vast majority of Ceph deployments. It generally just adds
complexity for no noticable gains.

On Sun, Sep 30, 2018, 10:11 PM Joshua Chen 
wrote:

> Hello Paul,
>   Thanks for your reply.
>   Now my clients will be from 140.109 (LAN, the real ip network 1Gb/s) and
> from 10.32 (SAN, a closed 10Gb network). Could I make this public_network
> to be 0.0.0.0? so mon daemon listens on both 1Gb and 10Gb network?
>   Or could I have
> public_network = 140.109.169.0/24, 10.32.67.0/24
> cluster_network = 10.32.67.0/24
>
> does ceph allow 2 (multiple) public_network?
>
>   And I don't want to limit the client read/write speed to be 1Gb/s
> nics unless they don't have 10Gb nic installed. To guarantee clients
> read/write to osd (when they know the details of the location) they should
> be using the fastest nic (10Gb) when available. But other clients with only
> 1Gb nic will go through 140.109.0.0 (1Gb LAN) to ask mon or to read/write
> to osds. This is why my osds also have 1Gb and 10Gb nics with 140.109.0.0
> and 10.32.0.0 networking respectively.
>
> Cheers
> Joshua
>
> On Sun, Sep 30, 2018 at 12:09 PM David Turner 
> wrote:
>
>> The cluster/private network is only used by the OSDs. Nothing else in
>> ceph or its clients communicate using it. Everything other than osd to osd
>> communication uses the public network. That includes the MONs, MDSs,
>> clients, and anything other than an osd talking to an osd. Nothing else
>> other than osd to osd traffic can communicate on the private/cluster
>> network.
>>
>> On Sat, Sep 29, 2018, 6:43 AM Paul Emmerich 
>> wrote:
>>
>>> All Ceph clients will always first connect to the mons. Mons provide
>>> further information on the cluster such as the IPs of MDS and OSDs.
>>>
>>> This means you need to provide the mon IPs to the mount command, not
>>> the MDS IPs. Your first command works by coincidence since
>>> you seem to run the mons and MDS' on the same server.
>>>
>>>
>>> Paul
>>> Am Sa., 29. Sep. 2018 um 12:07 Uhr schrieb Joshua Chen
>>> :
>>> >
>>> > Hello all,
>>> >   I am testing the cephFS cluster so that clients could mount -t ceph.
>>> >
>>> >   the cluster has 6 nodes, 3 mons (also mds), and 3 osds.
>>> >   All these 6 nodes has 2 nic, one 1Gb nic with real ip (140.109.0.0)
>>> and 1 10Gb nic with virtual ip (10.32.0.0)
>>> >
>>> > 140.109. Nic1 1G<-MDS1->Nic2 10G 10.32.
>>> > 140.109. Nic1 1G<-MDS2->Nic2 10G 10.32.
>>> > 140.109. Nic1 1G<-MDS3->Nic2 10G 10.32.
>>> > 140.109. Nic1 1G<-OSD1->Nic2 10G 10.32.
>>> > 140.109. Nic1 1G<-OSD2->Nic2 10G 10.32.
>>> > 140.109. Nic1 1G<-OSD3->Nic2 10G 10.32.
>>> >
>>> >
>>> >
>>> > and I have the following questions:
>>> >
>>> > 1, can I have both public (140.109.0.0) and cluster (10.32.0.0)
>>> clients all be able to mount this cephfs resource
>>> >
>>> > I want to do
>>> >
>>> > (in a 140.109 network client)
>>> > mount -t ceph mds1(140.109.169.48):/ /mnt/cephfs -o user=,secret=
>>> >
>>> > and also in a 10.32.0.0 network client)
>>> > mount -t ceph mds1(10.32.67.48):/
>>> > /mnt/cephfs -o user=,secret=
>>> >
>>> >
>>> >
>>> >
>>> > Currently, only this 10.32.0.0 clients can mount it. that of public
>>> network (140.109) can not. How can I enable this?
>>> >
>>> > here attached is my ceph.conf
>>> >
>>> > Thanks in advance
>>> >
>>> > Cheers
>>> > Joshua
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> --
>>> Paul Emmerich
>>>
>>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>>>
>>> croit GmbH
>>> Freseniusstr. 31h
>>> <https://maps.google.com/?q=Freseniusstr.+31h+%0D%0A81247+M%C3%BCnchen=gmail=g>
>>> 81247 München
>>> <https://maps.google.com/?q=Freseniusstr.+31h+%0D%0A81247+M%C3%BCnchen=gmail=g>
>>> www.croit.io
>>> Tel: +49 89 1896585 90
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mount cephfs from a public network ip of mds

2018-09-29 Thread David Turner
The cluster/private network is only used by the OSDs. Nothing else in ceph
or its clients communicate using it. Everything other than osd to osd
communication uses the public network. That includes the MONs, MDSs,
clients, and anything other than an osd talking to an osd. Nothing else
other than osd to osd traffic can communicate on the private/cluster
network.

On Sat, Sep 29, 2018, 6:43 AM Paul Emmerich  wrote:

> All Ceph clients will always first connect to the mons. Mons provide
> further information on the cluster such as the IPs of MDS and OSDs.
>
> This means you need to provide the mon IPs to the mount command, not
> the MDS IPs. Your first command works by coincidence since
> you seem to run the mons and MDS' on the same server.
>
>
> Paul
> Am Sa., 29. Sep. 2018 um 12:07 Uhr schrieb Joshua Chen
> :
> >
> > Hello all,
> >   I am testing the cephFS cluster so that clients could mount -t ceph.
> >
> >   the cluster has 6 nodes, 3 mons (also mds), and 3 osds.
> >   All these 6 nodes has 2 nic, one 1Gb nic with real ip (140.109.0.0)
> and 1 10Gb nic with virtual ip (10.32.0.0)
> >
> > 140.109. Nic1 1G<-MDS1->Nic2 10G 10.32.
> > 140.109. Nic1 1G<-MDS2->Nic2 10G 10.32.
> > 140.109. Nic1 1G<-MDS3->Nic2 10G 10.32.
> > 140.109. Nic1 1G<-OSD1->Nic2 10G 10.32.
> > 140.109. Nic1 1G<-OSD2->Nic2 10G 10.32.
> > 140.109. Nic1 1G<-OSD3->Nic2 10G 10.32.
> >
> >
> >
> > and I have the following questions:
> >
> > 1, can I have both public (140.109.0.0) and cluster (10.32.0.0) clients
> all be able to mount this cephfs resource
> >
> > I want to do
> >
> > (in a 140.109 network client)
> > mount -t ceph mds1(140.109.169.48):/ /mnt/cephfs -o user=,secret=
> >
> > and also in a 10.32.0.0 network client)
> > mount -t ceph mds1(10.32.67.48):/
> > /mnt/cephfs -o user=,secret=
> >
> >
> >
> >
> > Currently, only this 10.32.0.0 clients can mount it. that of public
> network (140.109) can not. How can I enable this?
> >
> > here attached is my ceph.conf
> >
> > Thanks in advance
> >
> > Cheers
> > Joshua
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-09-27 Thread David Turner
I got pulled away from this for a while.  The error in the log is "abort:
Corruption: Snappy not supported or corrupted Snappy compressed block
contents" and the OSD has 2 settings set to snappy by default,
async_compressor_type and bluestore_compression_algorithm.  Do either of
these settings affect the omap store?

On Wed, Sep 19, 2018 at 2:33 PM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> Looks like you are running on CentOS, fwiw. We’ve successfully ran the
> conversion commands on Jewel, Ubuntu 16.04.
>
> Have a feel it’s expecting the compression to be enabled, can you try
> removing “compression=kNoCompression” from the filestore_rocksdb_options?
> And/or you might want to check if rocksdb is expecting snappy to be enabled.
>
> From: David Turner 
> Date: Tuesday, September 18, 2018 at 6:01 PM
> To: Pavan Rallabhandi 
> Cc: ceph-users 
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> Here's the [1] full log from the time the OSD was started to the end of
> the crash dump.  These logs are so hard to parse.  Is there anything useful
> in them?
>
> I did confirm that all perms were set correctly and that the superblock
> was changed to rocksdb before the first time I attempted to start the OSD
> with it's new DB.  This is on a fully Luminous cluster with [2] the
> defaults you mentioned.
>
> [1] https://gist.github.com/drakonstein/fa3ac0ad9b2ec1389c957f95e05b79ed
> [2] "filestore_omap_backend": "rocksdb",
> "filestore_rocksdb_options":
> "max_background_compactions=8,compaction_readahead_size=2097152,compression=kNoCompression",
>
> On Tue, Sep 18, 2018 at 5:29 PM Pavan Rallabhandi  prallabha...@walmartlabs.com> wrote:
> I meant the stack trace hints that the superblock still has leveldb in it,
> have you verified that already?
>
> On 9/18/18, 5:27 PM, "Pavan Rallabhandi"  prallabha...@walmartlabs.com> wrote:
>
> You should be able to set them under the global section and that
> reminds me, since you are on Luminous already, I guess those values are
> already the default, you can verify from the admin socket of any OSD.
>
> But the stack trace didn’t hint as if the superblock on the OSD is
> still considering the omap backend to be leveldb and to do with the
> compression.
>
> Thanks,
> -Pavan.
>
> From: David Turner <mailto:drakonst...@gmail.com>
> Date: Tuesday, September 18, 2018 at 5:07 PM
> To: Pavan Rallabhandi <mailto:prallabha...@walmartlabs.com>
> Cc: ceph-users <mailto:ceph-users@lists.ceph.com>
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> Are those settings fine to have be global even if not all OSDs on a
> node have rocksdb as the backend?  Or will I need to convert all OSDs on a
> node at the same time?
>
> On Tue, Sep 18, 2018 at 5:02 PM Pavan Rallabhandi <mailto:mailto:
> prallabha...@walmartlabs.com> wrote:
> The steps that were outlined for conversion are correct, have you
> tried setting some the relevant ceph conf values too:
>
> filestore_rocksdb_options =
> "max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"
>
> filestore_omap_backend = rocksdb
>
> Thanks,
> -Pavan.
>
> From: ceph-users <mailto:mailto:ceph-users-boun...@lists.ceph.com> on
> behalf of David Turner <mailto:mailto:drakonst...@gmail.com>
> Date: Tuesday, September 18, 2018 at 4:09 PM
> To: ceph-users <mailto:mailto:ceph-users@lists.ceph.com>
> Subject: EXT: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> I've finally learned enough about the OSD backend track down this
> issue to what I believe is the root cause.  LevelDB compaction is the
> common thread every time we move data around our cluster.  I've ruled out
> PG subfolder splitting, EC doesn't seem to be the root cause of this, and
> it is cluster wide as opposed to specific hardware.
>
> One of the first things I found after digging into leveldb omap
> compaction was [1] this article with a heading "RocksDB instead of LevelDB"
> which mentions that leveldb was replaced with rocksdb as the default db
> backend for filestore OSDs and was even backported to Jewel because of the
> performance improvements.
>
> I figured there must be a way to be able to upgrade an OSD to use
> rocksdb from leveldb without needing to fully backfill the entire OSD.
> There is [2] this article, but you need to have an active service account
> with RedHat to access

Re: [ceph-users] Mimic upgrade failure

2018-09-20 Thread David Turner
the lease isn't renewed instead of calling for an election?
>
> One thing I tried was to shutdown the entire cluster and bring up only
> the mon and mgr. The mons weren't able to hold their quorum with no osds
> running and the ceph-mon ms_dispatch thread runs at 100% for > 60s at a
> time.
>
>
>
> This is odd... with no other dameons running I'm not sure what would be
> eating up the CPU.  Can you run a 'perf top -p `pidof ceph-mon`' (or
> similar) on the machine to see what the process is doing?  You might need
> to install ceph-mon-dbg or ceph-debuginfo to get better symbols.
>
>
>
> 2018-09-19 03:56:21.729 7f4344ec1700 1 mon.sephmon2@1(peon).paxos(paxos
> active c 133382665..133383355) lease_timeout -- calling new election
>
>
>
> A workaround is probably to increase the lease timeout.  Try setting
> mon_lease = 15 (default is 5... could also go higher than 15) in the
> ceph.conf for all of the mons.  This is a bit of a band-aid but should
> help you keep the mons in quorum until we sort out what is going on.
>
> sage
>
>
>
>
>
> Thanks
> Kevin
>
> On 09/10/2018 07:06 AM, Sage Weil wrote:
>
> I took a look at the mon log you sent.  A few things I noticed:
>
> - The frequent mon elections seem to get only 2/3 mons about half of the
> time.
> - The messages coming in a mostly osd_failure, and half of those seem to
> be recoveries (cancellation of the failure message).
>
> It does smell a bit like a networking issue, or some tunable that relates
> to the messaging layer.  It might be worth looking at an OSD log for an
> osd that reported a failure and seeing what error code it coming up on the
> failed ping connection?  That might provide a useful hint (e.g.,
> ECONNREFUSED vs EMFILE or something).
>
> I'd also confirm that with nodown set the mon quorum stabilizes...
>
> sage
>
>
>
>
> On Mon, 10 Sep 2018, Kevin Hrpcek wrote:
>
>
>
> Update for the list archive.
>
> I went ahead and finished the mimic upgrade with the osds in a fluctuating
> state of up and down. The cluster did start to normalize a lot easier after
> everything was on mimic since the random mass OSD heartbeat failures stopped
> and the constant mon election problem went away. I'm still battling with the
> cluster reacting poorly to host reboots or small map changes, but I feel like
> my current pg:osd ratio may be playing a factor in that since we are 2x normal
> pg count while migrating data to new EC pools.
>
> I'm not sure of the root cause but it seems like the mix of luminous and mimic
> did not play well together for some reason. Maybe it has to do with the scale
> of my cluster, 871 osd, or maybe I've missed some some tuning as my cluster
> has scaled to this size.
>
> Kevin
>
>
> On 09/09/2018 12:49 PM, Kevin Hrpcek wrote:
>
>
> Nothing too crazy for non default settings. Some of those osd settings were
> in place while I was testing recovery speeds and need to be brought back
> closer to defaults. I was setting nodown before but it seems to mask the
> problem. While its good to stop the osdmap changes, OSDs would come up, get
> marked up, but at some point go down again (but the process is still
> running) and still stay up in the map. Then when I'd unset nodown the
> cluster would immediately mark 250+ osd down again and i'd be back where I
> started.
>
> This morning I went ahead and finished the osd upgrades to mimic to remove
> that variable. I've looked for networking problems but haven't found any. 2
> of the mons are on the same switch. I've also tried combinations of shutting
> down a mon to see if a single one was the problem, but they keep electing no
> matter the mix of them that are up. Part of it feels like a networking
> problem but I haven't been able to find a culprit yet as everything was
> working normally before starting the upgrade. Other than the constant mon
> elections, yesterday I had the cluster 95% healthy 3 or 4 times, but it
> doesn't last long since at some point the OSDs start trying to fail each
> other through their heartbeats.
> 2018-09-09 17:37:29.079 7eff774f5700  1 mon.sephmon1@0(leader).osd e991282
> prepare_failure osd.39 10.1.9.2:6802/168438 from osd.49 10.1.9.3:6884/317908
> is reporting failure:1
> 2018-09-09 17:37:29.079 7eff774f5700  0 log_channel(cluster) log [DBG] :
> osd.39 10.1.9.2:6802/168438 reported failed by osd.49 10.1.9.3:6884/317908
> 2018-09-09 <http://10.1.9.3:6884/3179082018-09-09> 17:37:29.083 7eff774f5700  
> 1 mon.sephmon1@0(leader).osd e991282
> prepare_failure osd.93 10.1.9.9:6853/287469 from osd.37210.1.9.13:6801/275806 
> is reporting failure:1
>
> I'm working on getting things mostly good again with everything on mimic and
> will 

Re: [ceph-users] Ceph Mimic packages not available for Ubuntu Trusty

2018-09-19 Thread David Turner
No, Ceph Mimic will not be available for Ubuntu Trusty 14.04.  That release
is almost 4.5 years old now, you should start planning towards an OS
upgrade.
On Wed, Sep 19, 2018 at 8:54 AM Jakub Jaszewski 
wrote:

> Hi Cephers,
>
> Any plans for Ceph Mimic packages for Ubuntu Trusty? I found only
> ceph-deploy.
> https://download.ceph.com/debian-mimic/dists/trusty/main/binary-amd64/
>
> Thanks
> Jakub
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-09-18 Thread David Turner
Here's the [1] full log from the time the OSD was started to the end of the
crash dump.  These logs are so hard to parse.  Is there anything useful in
them?

I did confirm that all perms were set correctly and that the superblock was
changed to rocksdb before the first time I attempted to start the OSD with
it's new DB.  This is on a fully Luminous cluster with [2] the defaults you
mentioned.

[1] https://gist.github.com/drakonstein/fa3ac0ad9b2ec1389c957f95e05b79ed
[2] "filestore_omap_backend": "rocksdb",
"filestore_rocksdb_options":
"max_background_compactions=8,compaction_readahead_size=2097152,compression=kNoCompression",

On Tue, Sep 18, 2018 at 5:29 PM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> I meant the stack trace hints that the superblock still has leveldb in it,
> have you verified that already?
>
> On 9/18/18, 5:27 PM, "Pavan Rallabhandi" 
> wrote:
>
> You should be able to set them under the global section and that
> reminds me, since you are on Luminous already, I guess those values are
> already the default, you can verify from the admin socket of any OSD.
>
> But the stack trace didn’t hint as if the superblock on the OSD is
> still considering the omap backend to be leveldb and to do with the
> compression.
>
> Thanks,
> -Pavan.
>
> From: David Turner 
> Date: Tuesday, September 18, 2018 at 5:07 PM
> To: Pavan Rallabhandi 
> Cc: ceph-users 
> Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> Are those settings fine to have be global even if not all OSDs on a
> node have rocksdb as the backend?  Or will I need to convert all OSDs on a
> node at the same time?
>
> On Tue, Sep 18, 2018 at 5:02 PM Pavan Rallabhandi  prallabha...@walmartlabs.com> wrote:
> The steps that were outlined for conversion are correct, have you
> tried setting some the relevant ceph conf values too:
>
> filestore_rocksdb_options =
> "max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"
>
> filestore_omap_backend = rocksdb
>
> Thanks,
> -Pavan.
>
> From: ceph-users <mailto:ceph-users-boun...@lists.ceph.com> on behalf
> of David Turner <mailto:drakonst...@gmail.com>
> Date: Tuesday, September 18, 2018 at 4:09 PM
> To: ceph-users <mailto:ceph-users@lists.ceph.com>
> Subject: EXT: [ceph-users] Any backfill in our cluster makes the
> cluster unusable and takes forever
>
> I've finally learned enough about the OSD backend track down this
> issue to what I believe is the root cause.  LevelDB compaction is the
> common thread every time we move data around our cluster.  I've ruled out
> PG subfolder splitting, EC doesn't seem to be the root cause of this, and
> it is cluster wide as opposed to specific hardware.
>
> One of the first things I found after digging into leveldb omap
> compaction was [1] this article with a heading "RocksDB instead of LevelDB"
> which mentions that leveldb was replaced with rocksdb as the default db
> backend for filestore OSDs and was even backported to Jewel because of the
> performance improvements.
>
> I figured there must be a way to be able to upgrade an OSD to use
> rocksdb from leveldb without needing to fully backfill the entire OSD.
> There is [2] this article, but you need to have an active service account
> with RedHat to access it.  I eventually came across [3] this article about
> optimizing Ceph Object Storage which mentions a resolution to OSDs flapping
> due to omap compaction to migrate to using rocksdb.  It links to the RedHat
> article, but also has [4] these steps outlined in it.  I tried to follow
> the steps, but the OSD I tested this on was unable to start with [5] this
> segfault.  And then trying to move the OSD back to the original LevelDB
> omap folder resulted in [6] this in the log.  I apologize that all of my
> logging is with log level 1.  If needed I can get some higher log levels.
>
> My Ceph version is 12.2.4.  Does anyone have any suggestions for how I
> can update my filestore backend from leveldb to rocksdb?  Or if that's the
> wrong direction and I should be looking elsewhere?  Thank you.
>
>
> [1] https://ceph.com/community/new-luminous-rados-improvements/
> [2] https://access.redhat.com/solutions/3210951
> [3]
> https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize
> Ceph object storage for production in multisite clouds.pdf
>
> [4] ■ Stop the OSD
> ■ mv /var/lib/ceph/osd/ceph-/current/omap
> /var/lib/ceph/osd/ceph-/omap.orig
> ■ ul

Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-09-18 Thread David Turner
Are those settings fine to have be global even if not all OSDs on a node
have rocksdb as the backend?  Or will I need to convert all OSDs on a node
at the same time?

On Tue, Sep 18, 2018 at 5:02 PM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> The steps that were outlined for conversion are correct, have you tried
> setting some the relevant ceph conf values too:
>
> filestore_rocksdb_options =
> "max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"
>
> filestore_omap_backend = rocksdb
>
> Thanks,
> -Pavan.
>
> From: ceph-users  on behalf of David
> Turner 
> Date: Tuesday, September 18, 2018 at 4:09 PM
> To: ceph-users 
> Subject: EXT: [ceph-users] Any backfill in our cluster makes the cluster
> unusable and takes forever
>
> I've finally learned enough about the OSD backend track down this issue to
> what I believe is the root cause.  LevelDB compaction is the common thread
> every time we move data around our cluster.  I've ruled out PG subfolder
> splitting, EC doesn't seem to be the root cause of this, and it is cluster
> wide as opposed to specific hardware.
>
> One of the first things I found after digging into leveldb omap compaction
> was [1] this article with a heading "RocksDB instead of LevelDB"
> which mentions that leveldb was replaced with rocksdb as the default db
> backend for filestore OSDs and was even backported to Jewel because of the
> performance improvements.
>
> I figured there must be a way to be able to upgrade an OSD to use rocksdb
> from leveldb without needing to fully backfill the entire OSD.  There is
> [2] this article, but you need to have an active service account with
> RedHat to access it.  I eventually came across [3] this article about
> optimizing Ceph Object Storage which mentions a resolution to OSDs flapping
> due to omap compaction to migrate to using rocksdb.  It links to the RedHat
> article, but also has [4] these steps outlined in it.  I tried to follow
> the steps, but the OSD I tested this on was unable to start with [5] this
> segfault.  And then trying to move the OSD back to the original LevelDB
> omap folder resulted in [6] this in the log.  I apologize that all of my
> logging is with log level 1.  If needed I can get some higher log levels.
>
> My Ceph version is 12.2.4.  Does anyone have any suggestions for how I can
> update my filestore backend from leveldb to rocksdb?  Or if that's the
> wrong direction and I should be looking elsewhere?  Thank you.
>
>
> [1] https://ceph.com/community/new-luminous-rados-improvements/
> [2] https://access.redhat.com/solutions/3210951
> [3]
> https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize
> Ceph object storage for production in multisite clouds.pdf
>
> [4] ■ Stop the OSD
> ■ mv /var/lib/ceph/osd/ceph-/current/omap /var/lib/ceph/osd/ceph-/omap.orig
> ■ ulimit -n 65535
> ■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig store-copy
> /var/lib/ceph/osd/ceph-/current/omap 1 rocksdb
> ■ ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-/current/omap
> --command check
> ■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
> ■ chown ceph.ceph /var/lib/ceph/osd/ceph-/current/omap -R
> ■ cd /var/lib/ceph/osd/ceph-; rm -rf omap.orig
> ■ Start the OSD
>
> [5] 2018-09-17 19:23:10.826227 7f1f3f2ab700 -1 abort: Corruption: Snappy
> not supported or corrupted Snappy compressed block contents
> 2018-09-17 19:23:10.830525 7f1f3f2ab700 -1 *** Caught signal (Aborted) **
>
> [6] 2018-09-17 19:27:34.010125 7fcdee97cd80 -1 osd.0 0 OSD:init: unable to
> mount object store
> 2018-09-17 19:27:34.010131 7fcdee97cd80 -1 ESC[0;31m ** ERROR: osd init
> failed: (1) Operation not permittedESC[0m
> 2018-09-17 19:27:54.225941 7f7f03308d80  0 set uid:gid to 167:167
> (ceph:ceph)
> 2018-09-17 19:27:54.225975 7f7f03308d80  0 ceph version 12.2.4
> (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
> (unknown), pid 361535
> 2018-09-17 19:27:54.231275 7f7f03308d80  0 pidfile_write: ignore empty
> --pid-file
> 2018-09-17 19:27:54.260207 7f7f03308d80  0 load: jerasure load: lrc load:
> isa
> 2018-09-17 19:27:54.260520 7f7f03308d80  0
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
> 2018-09-17 19:27:54.261135 7f7f03308d80  0
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
> 2018-09-17 19:27:54.261750 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2018-09-17 19:27:54.261757 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features

[ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-09-18 Thread David Turner
I've finally learned enough about the OSD backend track down this issue to
what I believe is the root cause.  LevelDB compaction is the common thread
every time we move data around our cluster.  I've ruled out PG subfolder
splitting, EC doesn't seem to be the root cause of this, and it is cluster
wide as opposed to specific hardware.

One of the first things I found after digging into leveldb omap compaction
was [1] this article with a heading "RocksDB instead of LevelDB"
which mentions that leveldb was replaced with rocksdb as the default db
backend for filestore OSDs and was even backported to Jewel because of the
performance improvements.

I figured there must be a way to be able to upgrade an OSD to use rocksdb
from leveldb without needing to fully backfill the entire OSD.  There is
[2] this article, but you need to have an active service account with
RedHat to access it.  I eventually came across [3] this article about
optimizing Ceph Object Storage which mentions a resolution to OSDs flapping
due to omap compaction to migrate to using rocksdb.  It links to the RedHat
article, but also has [4] these steps outlined in it.  I tried to follow
the steps, but the OSD I tested this on was unable to start with [5] this
segfault.  And then trying to move the OSD back to the original LevelDB
omap folder resulted in [6] this in the log.  I apologize that all of my
logging is with log level 1.  If needed I can get some higher log levels.

My Ceph version is 12.2.4.  Does anyone have any suggestions for how I can
update my filestore backend from leveldb to rocksdb?  Or if that's the
wrong direction and I should be looking elsewhere?  Thank you.


[1] https://ceph.com/community/new-luminous-rados-improvements/
[2] https://access.redhat.com/solutions/3210951
[3]
https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize%20Ceph%20object%20storage%20for%20production%20in%20multisite%20clouds.pdf

[4] ■ Stop the OSD
■ mv /var/lib/ceph/osd/ceph-/current/omap /var/lib/ceph/osd/ceph-/omap.orig
■ ulimit -n 65535
■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig store-copy
/var/lib/ceph/osd/ceph-/current/omap 1 rocksdb
■ ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-/current/omap
--command check
■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
■ chown ceph.ceph /var/lib/ceph/osd/ceph-/current/omap -R
■ cd /var/lib/ceph/osd/ceph-; rm -rf omap.orig
■ Start the OSD

[5] 2018-09-17 19:23:10.826227 7f1f3f2ab700 -1 abort: Corruption: Snappy
not supported or corrupted Snappy compressed block contents
2018-09-17 19:23:10.830525 7f1f3f2ab700 -1 *** Caught signal (Aborted) **

[6] 2018-09-17 19:27:34.010125 7fcdee97cd80 -1 osd.0 0 OSD:init: unable to
mount object store
2018-09-17 19:27:34.010131 7fcdee97cd80 -1 ESC[0;31m ** ERROR: osd init
failed: (1) Operation not permittedESC[0m
2018-09-17 19:27:54.225941 7f7f03308d80  0 set uid:gid to 167:167
(ceph:ceph)
2018-09-17 19:27:54.225975 7f7f03308d80  0 ceph version 12.2.4
(52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
(unknown), pid 361535
2018-09-17 19:27:54.231275 7f7f03308d80  0 pidfile_write: ignore empty
--pid-file
2018-09-17 19:27:54.260207 7f7f03308d80  0 load: jerasure load: lrc load:
isa
2018-09-17 19:27:54.260520 7f7f03308d80  0
filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2018-09-17 19:27:54.261135 7f7f03308d80  0
filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2018-09-17 19:27:54.261750 7f7f03308d80  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2018-09-17 19:27:54.261757 7f7f03308d80  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2018-09-17 19:27:54.261758 7f7f03308d80  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice()
is disabled via 'filestore splice' config option
2018-09-17 19:27:54.286454 7f7f03308d80  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2018-09-17 19:27:54.286572 7f7f03308d80  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is
disabled by conf
2018-09-17 19:27:54.287119 7f7f03308d80  0
filestore(/var/lib/ceph/osd/ceph-0) start omap initiation
2018-09-17 19:27:54.287527 7f7f03308d80 -1
filestore(/var/lib/ceph/osd/ceph-0) mount(1723): Error initializing leveldb
: Corruption: VersionEdit: unknown tag
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can we drop support of centos/rhel 7.4?

2018-09-14 Thread David Turner
It's odd to me because this feels like the opposite direction of the rest
of Ceph. Making management and operating Ceph simpler and easier. Requiring
fast OS upgrades on dot releases of Ceph versions is not that direction at
all.

On Fri, Sep 14, 2018, 9:25 AM David Turner  wrote:

> Release dates
> RHEL 7.4 - July 2017
> Luminous 12.2.0 - August 2017
> CentOS 7.4 - September 2017
> RHEL 7.5 - April 2018
> CentOS 7.5 - May 2018
> Mimic 13.2.0 - June 2018
>
> In the world of sysadmins it takes time to let new releases/OS's simmer
> before beginning to test them let alone upgrading to them. It is not
> possible to tell all companies that use CentOS that we have to move to a
> new OS upgrade 5 months after it is released. We are still testing if
> CentOS 7.5 works in our infrastructure in general let alone being up and
> running on it. The kernel upgrades alone are a big change now to mention
> the obvious package version changes. We don't even have the OK to install
> it in staging. Once we do, and we have the time to start testing it,
> ...among our other tasks, we can start regression testing our use case in
> staging before thinking about upgrading prod.
>
> That time frame isn't really so bad if everything is working great for
> ceph, but what if we're waiting on 12.2.9 and 13.2.2 for a bugfix that's
> giving us grief? Now we are not only dealing with the bugs, but now we have
> to regression test an OS upgrade, update our package management, and make
> sure our new deployments will have this version... And then we can start
> regression testing the new release that hopefully fixes the bugs we're
> dealing with...
>
> What about backporting the API standards to the CentOS 7.4 version of
> gperftools-libs?
>
> I've noticed little package issues like this in the past, but assumed that
> was because most development was done on Ubuntu instead of RHEL. We had to
> set our repos to a newer version of CentOS than we were running or willing
> to upgrade to just for a single package we needed. If y'all are really
> thinking of only supporting/testing the latest dot release of the latest
> major version of RHEL, then you might have just given me the fuel to be
> able to finally convince my company into allowing us to be the first
> application in 9,000 servers to not run CentOS. I've been trying to get
> them to allow it for a while because of the previous package issues, but I
> hadn't put much effort into it because I thought/hoped those problems might
> be behind us...
>
> Do y'all not test ceph on 7.3 right now? This email thread really might be
> enough to get us off of CentOS for Ceph.
>
> On Fri, Sep 14, 2018, 5:49 AM John Spray  wrote:
>
>> On Fri, Sep 14, 2018 at 3:48 AM kefu chai  wrote:
>> >
>> > hi ceph-{maintainers,users,developers},
>> >
>> > recently, i ran into an issue[0] which popped up when we build Ceph on
>> > centos 7.5, but test it on centos 7.4. as we know, the gperftools-libs
>> > package provides the tcmalloc allocator shared library, but centos 7.4
>> > and centos 7.5 ship different version of gperftools-{devel,libs}. the
>> > former ships 2.4, and the latter 2.6.1.
>> >
>> > the crux is that the tcmalloc in gperftools 2.6.1 implements more
>> > standard compliant C++ APIs, which were missing in gperftools 2.4.
>> > that's why we have failures like:
>> >
>> > ceph-osd: symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm
>> >
>> > when testing Ceph on centos 7.4.
>> >
>> > my question is: is it okay to drop the support of centos/rhel 7.4? so
>> > we will solely build and test the supported Ceph releases (luminous,
>> > mimic) on 7.5 ?
>>
>> My preference would be to target the latest minor release (i.e. 7.5)
>> of the major release.  We don't test on CentOS 7.1, 7.2 etc, so I
>> don't think we need to give 7.4 any special treatment.
>>
>> John
>>
>> >
>> > thanks,
>> >
>> > --
>> > [0] http://tracker.ceph.com/issues/35969
>> >
>> > --
>> > Regards
>> > Kefu Chai
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can we drop support of centos/rhel 7.4?

2018-09-14 Thread David Turner
Release dates
RHEL 7.4 - July 2017
Luminous 12.2.0 - August 2017
CentOS 7.4 - September 2017
RHEL 7.5 - April 2018
CentOS 7.5 - May 2018
Mimic 13.2.0 - June 2018

In the world of sysadmins it takes time to let new releases/OS's simmer
before beginning to test them let alone upgrading to them. It is not
possible to tell all companies that use CentOS that we have to move to a
new OS upgrade 5 months after it is released. We are still testing if
CentOS 7.5 works in our infrastructure in general let alone being up and
running on it. The kernel upgrades alone are a big change now to mention
the obvious package version changes. We don't even have the OK to install
it in staging. Once we do, and we have the time to start testing it,
...among our other tasks, we can start regression testing our use case in
staging before thinking about upgrading prod.

That time frame isn't really so bad if everything is working great for
ceph, but what if we're waiting on 12.2.9 and 13.2.2 for a bugfix that's
giving us grief? Now we are not only dealing with the bugs, but now we have
to regression test an OS upgrade, update our package management, and make
sure our new deployments will have this version... And then we can start
regression testing the new release that hopefully fixes the bugs we're
dealing with...

What about backporting the API standards to the CentOS 7.4 version of
gperftools-libs?

I've noticed little package issues like this in the past, but assumed that
was because most development was done on Ubuntu instead of RHEL. We had to
set our repos to a newer version of CentOS than we were running or willing
to upgrade to just for a single package we needed. If y'all are really
thinking of only supporting/testing the latest dot release of the latest
major version of RHEL, then you might have just given me the fuel to be
able to finally convince my company into allowing us to be the first
application in 9,000 servers to not run CentOS. I've been trying to get
them to allow it for a while because of the previous package issues, but I
hadn't put much effort into it because I thought/hoped those problems might
be behind us...

Do y'all not test ceph on 7.3 right now? This email thread really might be
enough to get us off of CentOS for Ceph.

On Fri, Sep 14, 2018, 5:49 AM John Spray  wrote:

> On Fri, Sep 14, 2018 at 3:48 AM kefu chai  wrote:
> >
> > hi ceph-{maintainers,users,developers},
> >
> > recently, i ran into an issue[0] which popped up when we build Ceph on
> > centos 7.5, but test it on centos 7.4. as we know, the gperftools-libs
> > package provides the tcmalloc allocator shared library, but centos 7.4
> > and centos 7.5 ship different version of gperftools-{devel,libs}. the
> > former ships 2.4, and the latter 2.6.1.
> >
> > the crux is that the tcmalloc in gperftools 2.6.1 implements more
> > standard compliant C++ APIs, which were missing in gperftools 2.4.
> > that's why we have failures like:
> >
> > ceph-osd: symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm
> >
> > when testing Ceph on centos 7.4.
> >
> > my question is: is it okay to drop the support of centos/rhel 7.4? so
> > we will solely build and test the supported Ceph releases (luminous,
> > mimic) on 7.5 ?
>
> My preference would be to target the latest minor release (i.e. 7.5)
> of the major release.  We don't test on CentOS 7.1, 7.2 etc, so I
> don't think we need to give 7.4 any special treatment.
>
> John
>
> >
> > thanks,
> >
> > --
> > [0] http://tracker.ceph.com/issues/35969
> >
> > --
> > Regards
> > Kefu Chai
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Updating CRUSH Tunables to Jewel from Hammer

2018-09-13 Thread David Turner
I have a stage cluster with 4 HDDs and an SSD in each host.  I have an EC
profile that specifically chooses HDDs for placement.  Also several Replica
pools that write to either HDD or SSD.  This has all worked well for a
while.  When I updated the Tunables to Jewel on the cluster, all of a
sudden the data for the EC profile started placing it's data on the SSDs
and filling them up.  Setting the CRUSH Tunables back to Hammer reverts
this change and all is well again.

The odd part is that it's not like it's choosing to mix the data on HDDs
and SSDs, it just moves the data to all SSDs and off of the HDDs.  Has
anyone else experienced this or know what is causing it to choose to place
the EC PGs on the wrong device-class?

[1] This is the rule in question.


[1]
{
"rule_id": 2,
"rule_name": "local-stage.rgw.buckets.data",
"ruleset": 2,
"type": 3,
"min_size": 3,
"max_size": 5,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -24,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Performance predictions moving bluestore wall, db to ssd

2018-09-12 Thread David Turner
Sorry, I was wrong that it was you.  I just double checked.  But there is a
new thread as of this morning about this topic where someone is running
benchmark tests with numbers titled "Benchmark does not show gains with DB
on SSD".
On Wed, Sep 12, 2018 at 12:20 PM David Turner  wrote:

> You already have a thread talking about benchmarking the addition of WAL
> and DB partitions to an OSD.  Why are you creating a new one about the
> exact same thing?  As with everything, the performance increase isn't even
> solely answerable by which drives you have, there are a lot of factors that
> could introduce a bottleneck in your cluster.  But again, why create a new
> thread for the exact same topic?
>
> On Wed, Sep 12, 2018 at 12:06 PM Marc Roos 
> wrote:
>
>>
>> When having a hdd bluestore osd with collocated wal and db.
>>
>>
>> - What performance increase can be expected if one would move the wal to
>> an ssd?
>>
>> - What performance increase can be expected if one would move the db to
>> an ssd?
>>
>> - Would the performance be a lot if you have a very slow hdd (and thus
>> not so much when you have a very fast hdd (sas 15k))
>>
>> - It would be best to move the wal first to the ssd, and then maybe also
>> the db?
>>
>> In this CERN video (https://youtu.be/OopRMUYiY5E?t=931) of 2015 they are
>> talking about 5-10x increase etc. But that is filestore of course.
>>
>>
>>
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Performance predictions moving bluestore wall, db to ssd

2018-09-12 Thread David Turner
You already have a thread talking about benchmarking the addition of WAL
and DB partitions to an OSD.  Why are you creating a new one about the
exact same thing?  As with everything, the performance increase isn't even
solely answerable by which drives you have, there are a lot of factors that
could introduce a bottleneck in your cluster.  But again, why create a new
thread for the exact same topic?

On Wed, Sep 12, 2018 at 12:06 PM Marc Roos  wrote:

>
> When having a hdd bluestore osd with collocated wal and db.
>
>
> - What performance increase can be expected if one would move the wal to
> an ssd?
>
> - What performance increase can be expected if one would move the db to
> an ssd?
>
> - Would the performance be a lot if you have a very slow hdd (and thus
> not so much when you have a very fast hdd (sas 15k))
>
> - It would be best to move the wal first to the ssd, and then maybe also
> the db?
>
> In this CERN video (https://youtu.be/OopRMUYiY5E?t=931) of 2015 they are
> talking about 5-10x increase etc. But that is filestore of course.
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Benchmark does not show gains with DB on SSD

2018-09-12 Thread David Turner
If you're writes are small enough (64k or smaller) they're being placed on
the WAL device regardless of where your DB is.  If you change your testing
to use larger writes you should see a difference by adding the DB.

Please note that the community has never recommended using less than 120GB
DB for a 12TB OSD and the docs have come out and officially said that you
should use at least a 480GB DB for a 12TB OSD.  If you're setting up your
OSDs with a 30GB DB, you're just going to fill that up really quick and
spill over onto the HDD and have wasted your money on the SSDs.

On Wed, Sep 12, 2018 at 11:07 AM Ján Senko  wrote:

> We are benchmarking a test machine which has:
> 8 cores, 64GB RAM
> 12 * 12 TB HDD (SATA)
> 2 * 480 GB SSD (SATA)
> 1 * 240 GB SSD (NVME)
> Ceph Mimic
>
> Baseline benchmark for HDD only (Erasure Code 4+2)
> Write 420 MB/s, 100 IOPS, 150ms latency
> Read 1040 MB/s, 260 IOPS, 60ms latency
>
> Now we moved WAL to the SSD (all 12 WALs on single SSD, default size
> (512MB)):
> Write 640 MB/s, 160 IOPS, 100ms latency
> Read identical as above.
>
> Nice boost we thought, so we moved WAL+DB to the SSD (Assigned 30GB for DB)
> All results are the same as above!
>
> Q: This is suspicious, right? Why is the DB on SSD not helping with our
> benchmark? We use *rados bench*
>
> We tried putting WAL on the NVME, and again, the results are the same as
> on SSD.
> Same for WAL+DB on NVME
>
> Again, the same speed. Any ideas why we don't gain speed by using faster
> HW here?
>
> Jan
>
> --
> Jan Senko, Skype janos-
> Phone in Switzerland: +41 774 144 602
> Phone in Czech Republic: +420 777 843 818 <+420%20777%20843%20818>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph balancer "Error EAGAIN: compat weight-set not available"

2018-09-11 Thread David Turner
ceph balancer status
ceph config-key dump | grep balancer
ceph osd dump | grep min_compat_client
ceph osd crush dump | grep straw
ceph osd crush dump | grep profile
ceph features

You didn't mention it, but based on your error and my experiences over the
last week getting the balancer working, you're trying to use crush-compat.
Running all of those commands should give you the information you need to
fix everything up for the balancer to work.  With the first 2, you need to
make sure that you have your mode set properly as well as double check any
other settings you're going for with the balancer.  Everything else stems
off of a requirement of having your buckets being straw2 instead of straw
for the balancer to work.  I'm sure you'll notice that your cluster has
older compatibility requirements and crush profile than hammer and that
your buckets are using the straw algorithm instead of straw2.

Running [1] these commands will fix up your cluster so that you are now
using straw2 and have your minimum required clients and profile to hammer
which is the ceph release that introduced straw2.  Before running these
commands make sure that the output of `ceph features` does not show any
firefly clients connected to your cluster.  If you do have any, it is
likely due to outdated kernels or clients installed without the upstream
ceph repo and just using the version of ceph in the canonical repos or
similar for your distribution.  If you do happen to have any firefly, or
older, clients connected to your cluster, then you need to update those
clients before running the commands.

There will be some data movement, but I didn't see more than ~5% data
movement on any of the 8 clusters I ran them on.  That data movement will
be higher if you do not have a standard size of OSD drive in your clusters
like some 2TB disks and some 8TB disks across your cluster will probably
cause some more data movement then I saw, but it should still be within
reason.  This data movement is because straw2 can handle that situation
better than straw did and will allow your cluster to better balance itself
even without the balancer module.

If you don't even have any hammer clients, then go ahead and set the
min-compat-client to jewel as well as the crush tunables to jewel.  Setting
them to Jewel will cause a bit more data movement, but again for good
reasons.

The tl;dr of your error is that your cluster has been running since at
least hammer which started with older default settings than are required by
the balancer module.  As you've updated your cluster you didn't allow it to
utilize new features in the backend by leaving your crush tunables alone
during all of the upgrades to new versions.  To learn more about the
changes to the crush tunables you can check out the ceph wiki [2] here.

[1]
ceph osd set-require-min-compat-client hammer
ceph osd crush set-all-straw-buckets-to-straw2
ceph osd crush tunables hammer

[2] http://docs.ceph.com/docs/master/rados/operations/crush-map/

On Tue, Sep 11, 2018 at 6:24 AM Marc Roos  wrote:

>
> I am new, with using the balancer, I think this should generated a plan
> not? Do not get what this error is about.
>
>
> [@c01 ~]# ceph balancer optimize balancer-test.plan
> Error EAGAIN: compat weight-set not available
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd-nbd on CentOS

2018-09-10 Thread David Turner
Now that you mention it, I remember those threads on the ML.  What happens
if you use --yes-i-really-mean-it to do those things and then later you try
to map an RBD with an older kernel for CentOS 7.3 or 7.4?  Will that
mapping fail because of the min-client-version of luminous set on the
cluster while allowing CentOS 7.5 clients map RBDs?

On Mon, Sep 10, 2018 at 1:33 PM Ilya Dryomov  wrote:

> On Mon, Sep 10, 2018 at 7:19 PM David Turner 
> wrote:
> >
> > I haven't found any mention of this on the ML and Google's results are
> all about compiling your own kernel to use NBD on CentOS. Is everyone
> that's using rbd-nbd on CentOS honestly compiling their own kernels for the
> clients? This feels like something that shouldn't be necessary anymore.
> >
> > I would like to use the balancer module with upmap, but can't do that
> with kRBD because even the latest kernels still register as Jewel. What
> have y'all done to use rbd-nbd on CentOS? I'm hoping I'm missing something
> and not that I'll need to compile a kernel to use on all of the hosts that
> I want to map RBDs to.
>
> FWIW upmap is fully supported since 4.13 and RHEL 7.5:
>
>   https://www.spinics.net/lists/ceph-users/msg45071.html
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/029105.html
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd-nbd on CentOS

2018-09-10 Thread David Turner
I haven't found any mention of this on the ML and Google's results are all
about compiling your own kernel to use NBD on CentOS. Is everyone that's
using rbd-nbd on CentOS honestly compiling their own kernels for the
clients? This feels like something that shouldn't be necessary anymore.

I would like to use the balancer module with upmap, but can't do that with
kRBD because even the latest kernels still register as Jewel. What have
y'all done to use rbd-nbd on CentOS? I'm hoping I'm missing something and
not that I'll need to compile a kernel to use on all of the hosts that I
want to map RBDs to.

Alternatively there's rbd-fuse, but in it's current state it's too slow for
me. There's a [1] PR for an update to rbd-fuse that is promising. I have
seen the custom version of this rbd-fuse in action and it's really
impressive on speed. It can pretty much keep pace with the kernel client.
However, even if that does get merged, it'll be quite a while before it's
back-ported into a release.

[1] https://github.com/ceph/ceph/pull/23270
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade jewel to luminous with ec + cache pool

2018-09-10 Thread David Turner
Yes, migrating to 12.2.8 is fine. Migrating to not use the cache tier is as
simple as changing the ec pool mode to allow EC over writes, changing the
cache tier mode to forward, flushing the tier, and removing it. Basically
once you have EC over writes just follow the steps in the docs for removing
a cache tier.

On Mon, Sep 10, 2018, 7:29 AM Markus Hickel  wrote:

> Dear all,
>
> i am running a cephfs cluster (jewel 10.2.10) with a ec + cache pool.
> There is a thread in the ML that states skipping 10.2.11 and going to
> 11.2.8 is possible, does this work with ec + cache pool aswell ?
>
> I also wanted to ask if there is a recommended migration path from cephfs
> with ec + cache pool to cephfs with ec pool only ? Creating a second cephfs
> and moving the files would come to my mind, but maybe there is a smarter
> way ?
>
> Cheers,
> Markus
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing EC and Replicated pools on HDDs in Ceph RGW Luminous

2018-09-09 Thread David Turner
You can indeed have multiple types of pools on the same disks. Go ahead and
put the non-ec pool with a replicated ruleset on the HDDs with the EC data
pool. I believe your correct that the non-ec pool gets cleared out when the
upload is complete and the file is flushed to EC.

On Sun, Sep 9, 2018, 9:49 PM Nhat Ngo  wrote:

> Hi all,
>
>
> I am setting up RadosGW and Ceph cluster on Luminous. I am using EC for 
> `buckets.data`
> pool on HDD osds, is it okay to put `buckets.non-ec` pool with replicated
> ruleset for multi-parts upload on the same HDD osds? Will there be issues
> with mixing EC and replicated pools on the same disk types?
>
>
> We have a use case where users will upload large files up to 1TB each and
> unable to fit this pool into our metada NVMe SSD osds. My assumption on
> `buckets.non-ec` pool is that the objects on this pool will get cleared
> once the whole file is upload and transferred over to the EC pool. Is my
> understanding correct?
>
>
> Best regards,
>
> *Nhat Ngo* | Ops Engineer
>
> University of Melbourne, 3010, VIC
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Safe to use RBD mounts for Docker volumes on containerized Ceph nodes

2018-09-08 Thread David Turner
The problem is with the kernel pagecache. If that is still shared in a
containerized environment with the OSDs in containers and RBDs which are
married on The node outside of containers, then it is indeed still a
problem. I would guess that's the case, but I do not know for certain.
Using rbd-nbd instead of krbd bypasses this problem and you can ignore it.
Only using krbd is problematic.

On Thu, Sep 6, 2018, 6:43 PM Jacob DeGlopper  wrote:

> I've seen the requirement not to mount RBD devices or CephFS filesystems
> on OSD nodes.  Does this still apply when the OSDs and clients using the
> RBD volumes are all in Docker containers?
>
> That is, is it possible to run a 3-server setup in production with both
> Ceph daemons (mon, mgr, and OSD) in containers, along with applications
> in containers using Ceph as shared storage (Elasticsearch, gitlab, etc)?
>
>  -- jacob
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic upgrade failure

2018-09-08 Thread David Turner
What osd/mon/etc config settings do you have that are not default? It might
be worth utilizing nodown to stop osds from marking each other down and
finish the upgrade to be able to set the minimum osd version to mimic. Stop
the osds in a node, manually mark them down, start them back up in mimic.
Depending on how bad things are, setting pause on the cluster to just
finish the upgrade faster might not be a bad idea either.

This should be a simple question, have you confirmed that there are no
networking problems between the MONs while the elections are happening?

On Sat, Sep 8, 2018, 7:52 PM Kevin Hrpcek 
wrote:

> Hey Sage,
>
> I've posted the file with my email address for the user. It is with
> debug_mon 20/20, debug_paxos 20/20, and debug ms 1/5. The mons are calling
> for elections about every minute so I let this run for a few elections and
> saw this node become the leader a couple times. Debug logs start around
> 23:27:30. I had managed to get about 850/857 osds up, but it seems that
> within the last 30 min it has all gone bad again due to the OSDs reporting
> each other as failed. We relaxed the osd_heartbeat_interval to 30 and
> osd_heartbeat_grace to 60 in an attempt to slow down how quickly OSDs are
> trying to fail each other. I'll put in the rocksdb_cache_size setting.
>
> Thanks for taking a look.
>
> Kevin
>
> On 09/08/2018 06:04 PM, Sage Weil wrote:
>
> Hi Kevin,
>
> I can't think of any major luminous->mimic changes off the top of my head
> that would impact CPU usage, but it's always possible there is something
> subtle.  Can you ceph-post-file a the full log from one of your mons
> (preferbably the leader)?
>
> You might try adjusting the rocksdb cache size.. try setting
>
>  rocksdb_cache_size = 1342177280   # 10x the default, ~1.3 GB
>
> on the mons and restarting?
>
> Thanks!
> sage
>
> On Sat, 8 Sep 2018, Kevin Hrpcek wrote:
>
>
> Hello,
>
> I've had a Luminous -> Mimic upgrade go very poorly and my cluster is stuck
> with almost all pgs down. One problem is that the mons have started to
> re-elect a new quorum leader almost every minute. This is making it difficult
> to monitor the cluster and even run any commands on it since at least half the
> time a ceph command times out or takes over a minute to return results. I've
> looked at the debug logs and it appears there is some timeout occurring with
> paxos of about a minute. The msg_dispatch thread of the mons is often running
> a core at 100% for about a minute(user time, no iowait). Running strace on it
> shows the process is going through all of the mon db files (about 6gb in
> store.db/*.sst). Does anyone have an idea of what this timeout is or why my
> mons are always reelecting? One theory I have is that the msg_dispatch can't
> process the SST's fast enough and hits some timeout for a health check and the
> mon drops itself from the quorum since it thinks it isn't healthy. I've been
> thinking of introducing a new mon to the cluster on hardware with a better cpu
> to see if that can process the SSTs within this timeout.
>
> My cluster has the mons,mds,mgr and 30/41 osd servers on mimic, and 11/41 osd
> servers on luminous. The original problem started when I restarted the osds on
> one of the hosts. The cluster reacted poorly to them going down and went into
> a frenzy of taking down other osds and remapping. I eventually got that stable
> and the PGs were 90% good with the finish line in sight and then the mons
> started their issue of releecting every minute. Now I can't keep any decent
> amount of PGs up for more than a few hours. This started on Wednesday.
>
> Any help would be greatly appreciated.
>
> Thanks,
> Kevin
>
> --Debug snippet from a mon at reelection time
> 2018-09-07 20:08:08.655 7f57b92cd700 20 mon.sephmon2@1(leader).mds e14242
> maybe_resize_cluster in 1 max 1
> 2018-09-07 20:08:08.655 7f57b92cd700  4 mon.sephmon2@1(leader).mds e14242
> tick: resetting beacon timeouts due to mon delay (slow election?) of 59.8106s
> seconds
> 2018-09-07 20:08:08.655 7f57b92cd700 10
> mon.sephmon2@1(leader).paxosservice(mdsmap 13504..14242) maybe_trim trim_to
> 13742 would only trim 238 < paxos_service_trim_min 250
> 2018-09-07 20:08:08.655 7f57b92cd700 10 mon.sephmon2@1(leader).auth v120657
> auth
> 2018-09-07 20:08:08.655 7f57b92cd700 10 mon.sephmon2@1(leader).auth v120657
> check_rotate updated rotating
> 2018-09-07 20:08:08.655 7f57b92cd700 10
> mon.sephmon2@1(leader).paxosservice(auth 120594..120657) propose_pending
> 2018-09-07 20:08:08.655 7f57b92cd700 10 mon.sephmon2@1(leader).auth v120657
> encode_pending v 120658
> 2018-09-07 20:08:08.655 7f57b92cd700  5 mon.sephmon2@1(leader).paxos(paxos
> updating c 132917556..132918214) queue_pending_finisher 0x55dce8e5b370
> 2018-09-07 20:08:08.655 7f57b92cd700 10 mon.sephmon2@1(leader).paxos(paxos
> updating c 132917556..132918214) trigger_propose not active, will propose
> later
> 2018-09-07 20:08:08.655 7f57b92cd700  4 mon.sephmon2@1(leader).mgr e2234 tick:
> 

Re: [ceph-users] advice with erasure coding

2018-09-08 Thread David Turner
I tested running VMs on EC back in Hammer. The performance was just bad. I
didn't even need much io, but even performing standard maintenance was
annoying enough that I abandoned the idea. I didn't really try to tweak
settings to make it work and I only had a 3 node cluster running 2+1. I did
use it for write once/read many data volumes which worked great. I
eventually moved away from that on RBDs and migrated into EC on CephFS once
that became stable in Jewel. Now on Luminous I've even been able to remove
the cache tier I once had in front of all of the EC things.

On Fri, Sep 7, 2018, 5:19 PM Maged Mokhtar  wrote:

> On 2018-09-07 13:52, Janne Johansson wrote:
>
>
>
> Den fre 7 sep. 2018 kl 13:44 skrev Maged Mokhtar :
>
>>
>> Good day Cephers,
>>
>> I want to get some guidance on erasure coding, the docs do state the
>> different plugins and settings but to really understand them all and their
>> use cases is not easy:
>>
>> -Are the majority of implementations using jerasure and just configuring
>> k and m ?
>>
>
> Probably, yes
>
>
>> -For jerasure: when/if would i need to change
>> stripe_unit/osd_pool_erasure_code_stripe_unit/packetsize/algorithm ? The
>> main usage is rbd with 4M object size, the workload is virtualization with
>> average block size of 64k.
>>
>> Any help based on people's actual experience will be greatly appreciated..
>>
>>
>>
> Running VMs on top of EC pools is possible, but probably not recommended.
> All the random reads and writes they usually cause will make EC less
> suitable than replicated pools, even if it is possible.
>
> --
> May the most significant bit of your life be positive.
>
> Point well taken...it could be useful for backing up vms, and maybe vms
> without too much latency requirements if k and m are not large.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS tar archiving immediately after writing

2018-09-07 Thread David Turner
In searching the code for rbytes it makes a lot of sense how this is useful
for quotas in general.  While nothing references this variable in the
ceph-fuse code, it is in the general client config options as
`client_dirsize_rbytes = false`.  Setting that in the config file and
remounting ceph-fuse removed the sizes being displayed from the folders and
resolved the errors when immediately tarring a folder after modifying files
in it.

Thank you Greg for your help.

On Fri, Sep 7, 2018 at 2:52 PM Gregory Farnum  wrote:

> There's an option when mounting the FS on the client to not display those
> (on the kernel it's "norbytes"; see
> http://docs.ceph.com/docs/master/man/8/mount.ceph/?highlight=recursive; I
> didn't poke around to find it on ceph-fuse but it should be there).
> Calculating them is not very expensive (or at least, the expense is
> intrinsic to other necessary functions) so you can't disable it on the
> server.
> -Greg
>
> On Fri, Sep 7, 2018 at 11:48 AM David Turner 
> wrote:
>
>> Is it be possible to disable this feature?  Very few filesystems
>> calculate the size of its folder's contents.  I know I enjoy it in multiple
>> use cases, but there are some use cases where this is not useful and a
>> cause for unnecessary lag/processing.  I'm not certain how this is
>> calculated, but I could imagine some of those use cases with millions of
>> files in cephfs that waste time calculating a folder size that nobody looks
>> at is not ideal.
>>
>> On Fri, Sep 7, 2018 at 2:11 PM Gregory Farnum  wrote:
>>
>>> Hmm, I *think* this might be something we've seen before and is the
>>> result of our recursive statistics (ie, the thing that makes directory
>>> sizes reflect the data within them instead of 1 block size). If that's the
>>> case it should resolve within a few seconds to maybe tens of seconds under
>>> stress?
>>> But there's also some work to force a full flush of those rstats up the
>>> tree to enable good differential backups. Not sure what the status of that
>>> is.
>>> -Greg
>>>
>>> On Fri, Sep 7, 2018 at 11:06 AM David Turner 
>>> wrote:
>>>
>>>> We have an existing workflow that we've moved from one server sharing a
>>>> local disk via NFS to secondary servers to all of them mounting CephFS.
>>>> The primary server runs a script similar to [1] this, but since we've moved
>>>> it into CephFS, we get [2] this error.  We added the sync in there to try
>>>> to help this, but it didn't have an effect.
>>>>
>>>> Does anyone have a suggestion other than looping over a sleep to wait
>>>> for the tar to succeed?  Waiting just a few seconds to run tar does work,
>>>> but during a Ceph recovery situation, I can see that needing to be longer
>>>> and longer.
>>>>
>>>>
>>>> [1] #!/bin/bash
>>>> cp -R /tmp/17857283/db.sql /cephfs/17857283/
>>>> sync
>>>> tar --ignore-failed-read -cvzf /cephfs/17857283.tgz /cephfs/17857283
>>>>
>>>> [2] tar: Removing leading `/' from member names
>>>> /cephfs/17857283/
>>>> /cephfs/17857283/db.sql
>>>> tar: /cephfs/17857283: file changed as we read it
>>>>
>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS tar archiving immediately after writing

2018-09-07 Thread David Turner
Is it be possible to disable this feature?  Very few filesystems calculate
the size of its folder's contents.  I know I enjoy it in multiple use
cases, but there are some use cases where this is not useful and a cause
for unnecessary lag/processing.  I'm not certain how this is calculated,
but I could imagine some of those use cases with millions of files in
cephfs that waste time calculating a folder size that nobody looks at is
not ideal.

On Fri, Sep 7, 2018 at 2:11 PM Gregory Farnum  wrote:

> Hmm, I *think* this might be something we've seen before and is the result
> of our recursive statistics (ie, the thing that makes directory sizes
> reflect the data within them instead of 1 block size). If that's the case
> it should resolve within a few seconds to maybe tens of seconds under
> stress?
> But there's also some work to force a full flush of those rstats up the
> tree to enable good differential backups. Not sure what the status of that
> is.
> -Greg
>
> On Fri, Sep 7, 2018 at 11:06 AM David Turner 
> wrote:
>
>> We have an existing workflow that we've moved from one server sharing a
>> local disk via NFS to secondary servers to all of them mounting CephFS.
>> The primary server runs a script similar to [1] this, but since we've moved
>> it into CephFS, we get [2] this error.  We added the sync in there to try
>> to help this, but it didn't have an effect.
>>
>> Does anyone have a suggestion other than looping over a sleep to wait for
>> the tar to succeed?  Waiting just a few seconds to run tar does work, but
>> during a Ceph recovery situation, I can see that needing to be longer and
>> longer.
>>
>>
>> [1] #!/bin/bash
>> cp -R /tmp/17857283/db.sql /cephfs/17857283/
>> sync
>> tar --ignore-failed-read -cvzf /cephfs/17857283.tgz /cephfs/17857283
>>
>> [2] tar: Removing leading `/' from member names
>> /cephfs/17857283/
>> /cephfs/17857283/db.sql
>> tar: /cephfs/17857283: file changed as we read it
>>
> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS tar archiving immediately after writing

2018-09-07 Thread David Turner
We have an existing workflow that we've moved from one server sharing a
local disk via NFS to secondary servers to all of them mounting CephFS.
The primary server runs a script similar to [1] this, but since we've moved
it into CephFS, we get [2] this error.  We added the sync in there to try
to help this, but it didn't have an effect.

Does anyone have a suggestion other than looping over a sleep to wait for
the tar to succeed?  Waiting just a few seconds to run tar does work, but
during a Ceph recovery situation, I can see that needing to be longer and
longer.


[1] #!/bin/bash
cp -R /tmp/17857283/db.sql /cephfs/17857283/
sync
tar --ignore-failed-read -cvzf /cephfs/17857283.tgz /cephfs/17857283

[2] tar: Removing leading `/' from member names
/cephfs/17857283/
/cephfs/17857283/db.sql
tar: /cephfs/17857283: file changed as we read it
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph talks from Mounpoint.io

2018-09-06 Thread David Turner
They mentioned that they were going to send the slides to everyone that had
their badges scanned at the conference.  I haven't seen that email come out
yet, though.

On Thu, Sep 6, 2018 at 4:14 PM Gregory Farnum  wrote:

> Unfortunately I don't believe anybody collected the slide files, so they
> aren't available for public access. :(
>
> On Wed, Sep 5, 2018 at 8:16 PM xiangyang yu  wrote:
>
>> Hi  Greg,
>> Where can we download the talk ppt at mountpoint.io?
>>
>> Best  wishes,
>> brandy
>>
>> Gregory Farnum  于2018年9月6日周四 上午7:05写道:
>>
> Hey all,
>>> Just wanted to let you know that all the talks from Mountpoint.io are
>>> now available on YouTube. These are reasonably high-quality videos and
>>> include Ceph talks such as:
>>> "Bringing smart device failure prediction to Ceph"
>>> "Pains & Pleasures Testing the Ceph Distributed Storage Stack"
>>> "Ceph cloud object storage: the right way"
>>> "Lessons Learned Scaling Ceph for Public Clouds"
>>> "Making Ceph fast in the face of failure"
>>> "Anatomy of a librados client application"
>>> "Self-aware Ceph: enabling ceph-mgr to control Ceph services via
>>> Kubernetes"
>>> "Doctor! I need Ceph: a journey of open source storage in healthcare‍"
>>> "Rook: Storage Orchestration for a Cloud-Native World"
>>> "What’s new in Ceph"
>>> and possibly some others I've missed (sorry!).
>>>
>>> https://www.youtube.com/playlist?list=PL3P__0CcDTTHn7_QtNauTqpYxLCczR431
>>>
>>> Enjoy!
>>> -Greg
>>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] help needed

2018-09-06 Thread David Turner
The official ceph documentation recommendations for a db partition for a
4TB bluestore osd would be 160GB each.

Samsung Evo Pro is not an Enterprise class SSD. A quick search of the ML
will allow which SSDs people are using.

As was already suggested, the better option is an HBA as opposed to a raid
controller. If you are set on your controllers, write-back is fine as long
as you have BBU. Otherwise you should be using write-through.

On Thu, Sep 6, 2018, 8:54 AM Muhammad Junaid 
wrote:

> Thanks. Can you please clarify, if we use any other enterprise class SSD
> for journal, should we enable write-back caching available on raid
> controller for journal device or connect it as write through. Regards.
>
> On Thu, Sep 6, 2018 at 4:50 PM Marc Roos  wrote:
>
>>
>>
>>
>> Do not use Samsung 850 PRO for journal
>> Just use LSI logic HBA (eg. SAS2308)
>>
>>
>> -Original Message-
>> From: Muhammad Junaid [mailto:junaid.fsd...@gmail.com]
>> Sent: donderdag 6 september 2018 13:18
>> To: ceph-users@lists.ceph.com
>> Subject: [ceph-users] help needed
>>
>> Hi there
>>
>> Hope, every one will be fine. I need an urgent help in ceph cluster
>> design. We are planning 3 OSD node cluster in the beginning. Details are
>> as under:
>>
>> Servers: 3 * DELL R720xd
>> OS Drives: 2 2.5" SSD
>> OSD Drives: 10  3.5" SAS 7200rpm 3/4 TB
>> Journal Drives: 2 SSD's Samsung 850 PRO 256GB each Raid controller: PERC
>> H710 (512MB Cache) OSD Drives: On raid0 mode Journal Drives: JBOD Mode
>> Rocks db: On same Journal drives
>>
>> My question is: is this setup good for a start? And critical question
>> is: should we enable write back caching on controller for Journal
>> drives? Pls suggest. Thanks in advance. Regards.
>>
>> Muhammad Junaid
>>
>>
>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-community] How to setup Ceph OSD auto boot up on node reboot

2018-09-05 Thread David Turner
The magic sauce to get Filestore OSDs to start on a node reboot is to make
sure that all of your udev magic is correct.  In particular you need to
have the correct UUID set for all partitions.  I haven't dealt with it in a
long time, but I've written up a few good ML responses about it.

On Tue, Sep 4, 2018 at 12:38 PM Pardhiv Karri  wrote:

> Hi,
>
> I created a ceph cluster  manually (not using ceph-deploy). When I reboot
> the node the osd's doesn't come backup because the OS doesn't know that it
> need to bring up the OSD. I am running this on Ubuntu 1604. Is there a
> standardized way to initiate ceph osd start on node reboot?
>
> "sudo start ceph-osd-all" isn't working well and doesn't like the idea of 
> "sudo start ceph-osd id=1" for each OSD in rc file.Need to do it for both 
> Hammer (Ubuntu 1404) and Luminous (Ubuntu 1604).
>
> --
> Thanks,
> *Pardhiv Karri*
> "Rise and Rise again until LAMBS become LIONS"
>
>
> ___
> Ceph-community mailing list
> ceph-commun...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.8 Luminous released

2018-09-05 Thread David Turner
I upgraded my home cephfs/rbd cluster to 12.2.8 during an OS upgrade to
Ubuntu 18.04 and ProxMox 5.1 (Stretch).  Everything is running well so far.

On Wed, Sep 5, 2018 at 10:21 AM Dan van der Ster  wrote:

> Thanks for the release!
>
> We've updated some test clusters (rbd, cephfs) and it looks good so far.
>
> -- dan
>
>
> On Tue, Sep 4, 2018 at 6:30 PM Abhishek Lekshmanan 
> wrote:
> >
> >
> > We're glad to announce the next point release in the Luminous v12.2.X
> > stable release series. This release contains a range of bugfixes and
> > stability improvements across all the components of ceph. For detailed
> > release notes with links to tracker issues and pull requests, refer to
> > the blog post at http://ceph.com/releases/v12-2-8-released/
> >
> > Upgrade Notes from previous luminous releases
> > -
> >
> > When upgrading from v12.2.5 or v12.2.6 please note that upgrade caveats
> from
> > 12.2.5 will apply to any _newer_ luminous version including 12.2.8.
> Please read
> > the notes at
> https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6
> >
> > For the cluster that installed the broken 12.2.6 release, 12.2.7 fixed
> the
> > regression and introduced a workaround option `osd distrust data digest
> = true`,
> > but 12.2.7 clusters still generated health warnings like ::
> >
> >   [ERR] 11.288 shard 207: soid
> >   11:1155c332:::rbd_data.207dce238e1f29.0527:head data_digest
> >   0xc8997a5b != data_digest 0x2ca15853
> >
> >
> > 12.2.8 improves the deep scrub code to automatically repair these
> > inconsistencies. Once the entire cluster has been upgraded and then
> fully deep
> > scrubbed, and all such inconsistencies are resolved; it will be safe to
> disable
> > the `osd distrust data digest = true` workaround option.
> >
> > Changelog
> > -
> > * bluestore: set correctly shard for existed Collection (issue#24761,
> pr#22860, Jianpeng Ma)
> > * build/ops: Boost system library is no longer required to compile and
> link example librados program (issue#25054, pr#23202, Nathan Cutler)
> > * build/ops: Bring back diff -y for non-FreeBSD (issue#24396,
> issue#21664, pr#22848, Sage Weil, David Zafman)
> > * build/ops: install-deps.sh fails on newest openSUSE Leap (issue#25064,
> pr#23179, Kyr Shatskyy)
> > * build/ops: Mimic build fails with -DWITH_RADOSGW=0 (issue#24437,
> pr#22864, Dan Mick)
> > * build/ops: order rbdmap.service before remote-fs-pre.target
> (issue#24713, pr#22844, Ilya Dryomov)
> > * build/ops: rpm: silence osd block chown (issue#25152, pr#23313, Dan
> van der Ster)
> > * cephfs-journal-tool: Fix purging when importing an zero-length journal
> (issue#24239, pr#22980, yupeng chen, zhongyan gu)
> > * cephfs: MDSMonitor: uncommitted state exposed to clients/mdss
> (issue#23768, pr#23013, Patrick Donnelly)
> > * ceph-fuse mount failed because no mds (issue#22205, pr#22895, liyan)
> > * ceph-volume add a __release__ string, to help version-conditional
> calls (issue#25170, pr#23331, Alfredo Deza)
> > * ceph-volume: adds test for `ceph-volume lvm list /dev/sda`
> (issue#24784, issue#24957, pr#23350, Andrew Schoen)
> > * ceph-volume: do not use stdin in luminous (issue#25173, issue#23260,
> pr#23367, Alfredo Deza)
> > * ceph-volume enable the ceph-osd during lvm activation (issue#24152,
> pr#23394, Dan van der Ster, Alfredo Deza)
> > * ceph-volume expand on the LVM API to create multiple LVs at different
> sizes (issue#24020, pr#23395, Alfredo Deza)
> > * ceph-volume lvm.activate conditional mon-config on prime-osd-dir
> (issue#25216, pr#23397, Alfredo Deza)
> > * ceph-volume lvm.batch remove non-existent sys_api property
> (issue#34310, pr#23811, Alfredo Deza)
> > * ceph-volume lvm.listing only include devices if they exist
> (issue#24952, pr#23150, Alfredo Deza)
> > * ceph-volume: process.call with stdin in Python 3 fix (issue#24993,
> pr#23238, Alfredo Deza)
> > * ceph-volume: PVolumes.get() should return one PV when using name or
> uuid (issue#24784, pr#23329, Andrew Schoen)
> > * ceph-volume: refuse to zap mapper devices (issue#24504, pr#23374,
> Andrew Schoen)
> > * ceph-volume: tests.functional inherit SSH_ARGS from ansible
> (issue#34311, pr#23813, Alfredo Deza)
> > * ceph-volume tests/functional run lvm list after OSD provisioning
> (issue#24961, pr#23147, Alfredo Deza)
> > * ceph-volume: unmount lvs correctly before zapping (issue#24796,
> pr#23128, Andrew Schoen)
> > * ceph-volume: update batch documentation to explain filestore
> strategies (issu

Re: [ceph-users] Luminous new OSD being over filled

2018-09-04 Thread David Turner
Instead of manually weighting the OSDs, you can use the mgr module to
slowly add the OSDs and balance your cluster at the same time.  I believe
you can control the module by telling it a maximum percent of misplaced
objects, or other similar metrics, to control adding in the OSD, while also
preventing your cluster from being poorly balanced.

On Mon, Sep 3, 2018 at 12:08 PM David C  wrote:

> Hi Marc
>
> I like that approach although I think I'd go in smaller weight increments.
>
> Still a bit confused by the behaviour I'm seeing, it looks like I've got
> things weighted correctly. Redhat's docs recommend doing an OSD at a time
> and I'm sure that's how I've done it on other clusters in the past although
> they would have been running older versions.
>
> Thanks,
>
> On Mon, Sep 3, 2018 at 1:45 PM Marc Roos  wrote:
>
>>
>>
>> I am adding a node like this, I think it is more efficient, because in
>> your case you will have data being moved within the added node (between
>> the newly added osd's there). So far no problems with this.
>>
>> Maybe limit your
>> ceph tell osd.* injectargs --osd_max_backfills=X
>> Because pg's being moved are taking space until the move is completed.
>>
>> sudo -u ceph ceph osd crush reweight osd.23 1 (all osd's in the node)
>> sudo -u ceph ceph osd crush reweight osd.24 1
>> sudo -u ceph ceph osd crush reweight osd.25 1
>> sudo -u ceph ceph osd crush reweight osd.26 1
>> sudo -u ceph ceph osd crush reweight osd.27 1
>> sudo -u ceph ceph osd crush reweight osd.28 1
>> sudo -u ceph ceph osd crush reweight osd.29 1
>>
>> And then after recovery
>>
>> sudo -u ceph ceph osd crush reweight osd.23 2
>> sudo -u ceph ceph osd crush reweight osd.24 2
>> sudo -u ceph ceph osd crush reweight osd.25 2
>> sudo -u ceph ceph osd crush reweight osd.26 2
>> sudo -u ceph ceph osd crush reweight osd.27 2
>> sudo -u ceph ceph osd crush reweight osd.28 2
>> sudo -u ceph ceph osd crush reweight osd.29 2
>>
>> Etc etc
>>
>>
>> -Original Message-
>> From: David C [mailto:dcsysengin...@gmail.com]
>> Sent: maandag 3 september 2018 14:34
>> To: ceph-users
>> Subject: [ceph-users] Luminous new OSD being over filled
>>
>> Hi all
>>
>>
>> Trying to add a new host to a Luminous cluster, I'm doing one OSD at a
>> time. I've only added one so far but it's getting too full.
>>
>> The drive is the same size (4TB) as all others in the cluster, all OSDs
>> have crush weight of 3.63689. Average usage on the drives is 81.70%
>>
>>
>> With the new OSD I start with a crush weight 0 and steadily increase.
>> It's currently crush weight 3.0 and is 94.78% full. If I increase to
>> 3.63689 it's going to hit too full.
>>
>>
>> It's been a while since I've added a host to an existing cluster. Any
>> idea why the drive is getting too full? Do I just have to leave this one
>> with a lower crush weight and then continue adding the drives and then
>> eventually even out the crush weights?
>>
>> Thanks
>> David
>>
>>
>>
>>
>>
>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous RGW errors at start

2018-09-04 Thread David Turner
I was confused what could be causing this until Janne's email.  I think
they're correct that the cluster is preventing pool creation due to too
many PGs per OSD.  Double check how many PGs you have in each pool and what
your defaults are for that.

On Mon, Sep 3, 2018 at 7:19 AM Janne Johansson  wrote:

> Did you change the default pg_num or pgp_num so the pools that did show up
> made it go past the mon_max_pg_per_osd ?
>
>
> Den fre 31 aug. 2018 kl 17:20 skrev Robert Stanford <
> rstanford8...@gmail.com>:
>
>>
>>  I installed a new Luminous cluster.  Everything is fine so far.  Then I
>> tried to start RGW and got this error:
>>
>> 2018-08-31 15:15:41.998048 7fc350271e80  0 rgw_init_ioctx ERROR:
>> librados::Rados::pool_create returned (34) Numerical result out of range
>> (this can be due to a pool or placement group misconfiguration, e.g. pg_num
>> < pgp_num or mon_max_pg_per_osd exceeded)
>> 2018-08-31 15:15:42.005732 7fc350271e80 -1 Couldn't init storage provider
>> (RADOS)
>>
>>  I notice that the only pools that exist are the data and index RGW pools
>> (no user or log pools like on Jewel).  What is causing this?
>>
>>  Thank you
>>  R
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Luminous - journal setting

2018-09-04 Thread David Turner
Are you planning on using bluestore or filestore?  The settings for
filestore haven't changed.  If you're planning to use bluestore there is a
lot of documentation in the ceph docs as well as a wide history of
questions like this on the ML.

On Mon, Sep 3, 2018 at 5:24 AM M Ranga Swami Reddy 
wrote:

> Hi  - I am using the Ceph Luminous release. here what are the OSD
> journal settings needed for OSD?
> NOTE: I used SSDs for journal till Jewel release.
>
> Thanks
> Swami
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous new OSD being over filled

2018-09-03 Thread David C
Hi Marc

I like that approach although I think I'd go in smaller weight increments.

Still a bit confused by the behaviour I'm seeing, it looks like I've got
things weighted correctly. Redhat's docs recommend doing an OSD at a time
and I'm sure that's how I've done it on other clusters in the past although
they would have been running older versions.

Thanks,

On Mon, Sep 3, 2018 at 1:45 PM Marc Roos  wrote:

>
>
> I am adding a node like this, I think it is more efficient, because in
> your case you will have data being moved within the added node (between
> the newly added osd's there). So far no problems with this.
>
> Maybe limit your
> ceph tell osd.* injectargs --osd_max_backfills=X
> Because pg's being moved are taking space until the move is completed.
>
> sudo -u ceph ceph osd crush reweight osd.23 1 (all osd's in the node)
> sudo -u ceph ceph osd crush reweight osd.24 1
> sudo -u ceph ceph osd crush reweight osd.25 1
> sudo -u ceph ceph osd crush reweight osd.26 1
> sudo -u ceph ceph osd crush reweight osd.27 1
> sudo -u ceph ceph osd crush reweight osd.28 1
> sudo -u ceph ceph osd crush reweight osd.29 1
>
> And then after recovery
>
> sudo -u ceph ceph osd crush reweight osd.23 2
> sudo -u ceph ceph osd crush reweight osd.24 2
> sudo -u ceph ceph osd crush reweight osd.25 2
> sudo -u ceph ceph osd crush reweight osd.26 2
> sudo -u ceph ceph osd crush reweight osd.27 2
> sudo -u ceph ceph osd crush reweight osd.28 2
> sudo -u ceph ceph osd crush reweight osd.29 2
>
> Etc etc
>
>
> -Original Message-
> From: David C [mailto:dcsysengin...@gmail.com]
> Sent: maandag 3 september 2018 14:34
> To: ceph-users
> Subject: [ceph-users] Luminous new OSD being over filled
>
> Hi all
>
>
> Trying to add a new host to a Luminous cluster, I'm doing one OSD at a
> time. I've only added one so far but it's getting too full.
>
> The drive is the same size (4TB) as all others in the cluster, all OSDs
> have crush weight of 3.63689. Average usage on the drives is 81.70%
>
>
> With the new OSD I start with a crush weight 0 and steadily increase.
> It's currently crush weight 3.0 and is 94.78% full. If I increase to
> 3.63689 it's going to hit too full.
>
>
> It's been a while since I've added a host to an existing cluster. Any
> idea why the drive is getting too full? Do I just have to leave this one
> with a lower crush weight and then continue adding the drives and then
> eventually even out the crush weights?
>
> Thanks
> David
>
>
>
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous new OSD being over filled

2018-09-03 Thread David C
Hi all

Trying to add a new host to a Luminous cluster, I'm doing one OSD at a
time. I've only added one so far but it's getting too full.

The drive is the same size (4TB) as all others in the cluster, all OSDs
have crush weight of 3.63689. Average usage on the drives is 81.70%

With the new OSD I start with a crush weight 0 and steadily increase. It's
currently crush weight 3.0 and is 94.78% full. If I increase to 3.63689
it's going to hit too full.

It's been a while since I've added a host to an existing cluster. Any idea
why the drive is getting too full? Do I just have to leave this one with a
lower crush weight and then continue adding the drives and then eventually
even out the crush weights?

Thanks
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous missing osd_backfill_full_ratio

2018-09-03 Thread David C
In the end if was because I hadn't completed the upgrade with "ceph osd
require-osd-release luminous", after setting that I had the default
backfill full (0.9 I think) and was able to change it with ceph osd set
backfillfull-ratio.

Potential gotcha for a Jewel -> Luminous upgrade if you delay the
"...require-osd-release luminous" for whatever reason as it appears to
leave you with no backfillfull limit

Still having a bit of an issue with new OSDs over filling but will start a
new thread for that

Cheers,

On Thu, Aug 30, 2018 at 10:34 PM David Turner  wrote:

> This moved to the PG map in luminous. I think it might have been there in
> Jewel as well.
>
> http://docs.ceph.com/docs/luminous/man/8/ceph/#pg
> ceph pg set_full_ratio 
> ceph pg set_backfillfull_ratio 
> ceph pg set_nearfull_ratio 
>
>
> On Thu, Aug 30, 2018, 1:57 PM David C  wrote:
>
>> Hi All
>>
>> I feel like this is going to be a silly query with a hopefully simple
>> answer. I don't seem to have the osd_backfill_full_ratio config option on
>> my OSDs and can't inject it. This a Lumimous 12.2.1 cluster that was
>> upgraded from Jewel.
>>
>> I added an OSD to the cluster and woke up the next day to find the OSD
>> had hit OSD_FULL. I'm pretty sure the reason it filled up was because the
>> new host was weighted too high (I initially add two OSDs but decided to
>> only backfill one at a time). The thing that surprised me was why a
>> backfill full ratio didn't kick in to prevent this from happening.
>>
>> One potentially key piece of info is I haven't run the "ceph osd
>> require-osd-release luminous" command yet (I wasn't sure what impact this
>> would have so was waiting for a window with quiet client I/O).
>>
>> ceph osd dump is showing zero for all full ratios:
>>
>> # ceph osd dump | grep full_ratio
>> full_ratio 0
>> backfillfull_ratio 0
>> nearfull_ratio 0
>>
>> Do I simply need to run ceph osd set -backfillfull-ratio? Or am I missing
>> something here. I don't understand why I don't have a default backfill_full
>> ratio on this cluster.
>>
>> Thanks,
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "no valid command found" when running "ceph-deploy osd create"

2018-09-02 Thread David Wahler
On Sun, Sep 2, 2018 at 1:31 PM Alfredo Deza  wrote:
>
> On Sun, Sep 2, 2018 at 12:00 PM, David Wahler  wrote:
> > Ah, ceph-volume.log pointed out the actual problem:
> >
> > RuntimeError: Cannot use device (/dev/storage/bluestore). A vg/lv path
> > or an existing device is needed
>
> That is odd, is it possible that the error log wasn't the one that
> matched what you saw on ceph-deploy's end?
>
> Usually ceph-deploy will just receive whatever ceph-volume produced.

I tried again, running ceph-volume directly this time, just to see if
I had mixed anything up. It looks like ceph-deploy is correctly
reporting the output of ceph-volume. The problem is that ceph-volume
only writes the relevant error message to the log file, and not to its
stdout/stderr.

Console output:

rock64@rockpro64-1:~/my-cluster$ sudo ceph-volume --cluster ceph lvm
create --bluestore --data /dev/storage/foobar
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
e7dd6d45-b556-461c-bad1-83d98a5a1afa
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.1
--yes-i-really-mean-it
 stderr: no valid command found; 10 closest matches:
[...etc...]

ceph-volume.log:

[2018-09-02 18:49:21,415][ceph_volume.main][INFO  ] Running command:
ceph-volume --cluster ceph lvm create --bluestore --data
/dev/storage/foobar
[2018-09-02 18:49:21,423][ceph_volume.process][INFO  ] Running
command: /usr/bin/ceph-authtool --gen-print-key
[2018-09-02 18:49:26,664][ceph_volume.process][INFO  ] stdout
AQCxMIxb+SezJRAAGAP/HHtHLVbciSQnZ/c/qw==
[2018-09-02 18:49:26,668][ceph_volume.process][INFO  ] Running
command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
e7dd6d45-b556-461c-bad1-83d98a5a1afa
[2018-09-02 18:49:27,685][ceph_volume.process][INFO  ] stdout 1
[2018-09-02 18:49:27,686][ceph_volume.process][INFO  ] Running
command: /bin/lsblk --nodeps -P -o
NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL
/dev/storage/foobar
[2018-09-02 18:49:27,707][ceph_volume.process][INFO  ] stdout
NAME="storage-foobar" KNAME="dm-1" MAJ:MIN="253:1" FSTYPE=""
MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="0" MODEL="" SIZE="100G"
STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw"
ALIGNMENT="0" PHY-SEC="4096" LOG-SEC="512" ROTA="1" SCHED=""
TYPE="lvm" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0"
PKNAME="" PARTLABEL=""
[2018-09-02 18:49:27,708][ceph_volume.process][INFO  ] Running
command: /bin/lsblk --nodeps -P -o
NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL
/dev/storage/foobar
[2018-09-02 18:49:27,720][ceph_volume.process][INFO  ] stdout
NAME="storage-foobar" KNAME="dm-1" MAJ:MIN="253:1" FSTYPE=""
MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="0" MODEL="" SIZE="100G"
STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw"
ALIGNMENT="0" PHY-SEC="4096" LOG-SEC="512" ROTA="1" SCHED=""
TYPE="lvm" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0"
PKNAME="" PARTLABEL=""
[2018-09-02 18:49:27,720][ceph_volume.devices.lvm.prepare][ERROR ] lvm
prepare was unable to complete
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py",
line 216, in safe_prepare
self.prepare(args)
  File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py",
line 16, in is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py",
line 283, in prepare
block_lv = self.prepare_device(args.data, 'block', cluster_fsid, osd_fsid)
  File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py",
line 206, in prepare_device
raise RuntimeError(' '.join(error))
RuntimeError: Cannot use device (/dev/storage/foobar). A vg/lv path or
an existing device is needed
[2018-09-02 18:49:27,722][ceph_volume.devices.lvm.prepare][INFO

Re: [ceph-users] Help Basically..

2018-09-02 Thread David Turner
Agreed on not going the disks until your cluster is healthy again. Making
them out and seeing how healthy you can get in the meantime is a good idea.

On Sun, Sep 2, 2018, 1:18 PM Ronny Aasen  wrote:

> On 02.09.2018 17:12, Lee wrote:
> > Should I just out the OSD's first or completely zap them and recreate?
> > Or delete and let the cluster repair itself?
> >
> > On the second node when it started back up I had problems with the
> > Journals for ID 5 and 7 they were also recreated all the rest are
> > still the originals.
> >
> > I know that some PG's are on both 24 and 5 and 7 ie.
>
>
> Personally I would never wipe a disk until the cluster is health_OK.
> out them from the cluster. And if you need the slot for healthy disks
> you can remove them physically, but label and store together with the
> journal until you are health_OK
>
> kind regards
> Ronny Aasen
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "no valid command found" when running "ceph-deploy osd create"

2018-09-02 Thread David Wahler
Ah, ceph-volume.log pointed out the actual problem:

RuntimeError: Cannot use device (/dev/storage/bluestore). A vg/lv path
or an existing device is needed

When I changed "--data /dev/storage/bluestore" to "--data
storage/bluestore", everything worked fine.

I agree that the ceph-deploy logs are a bit confusing. I submitted a
PR to add a brief note to the quick-start guide, in case anyone else
makes the same mistake: https://github.com/ceph/ceph/pull/23879

Thanks for the assistance!

-- David

On Sun, Sep 2, 2018 at 7:44 AM Alfredo Deza  wrote:
>
> There should be useful logs from ceph-volume in
> /var/log/ceph/ceph-volume.log that might show a bit more here.
>
> I would also try the command that fails directly on the server (sans
> ceph-deploy) to see what is it that is actually failing. Seems like
> the ceph-deploy log output is a bit out of order (some race condition
> here maybe)
>
>
> On Sun, Sep 2, 2018 at 2:53 AM, David Wahler  wrote:
> > Hi all,
> >
> > I'm attempting to get a small Mimic cluster running on ARM, starting
> > with a single node. Since there don't seem to be any Debian ARM64
> > packages in the official Ceph repository, I had to build from source,
> > which was fairly straightforward.
> >
> > After installing the .deb packages that I built and following the
> > quick-start guide
> > (http://docs.ceph.com/docs/mimic/start/quick-ceph-deploy/), things
> > seemed to be working fine at first, but I got this error when
> > attempting to create an OSD:
> >
> > rock64@rockpro64-1:~/my-cluster$ ceph-deploy osd create --data
> > /dev/storage/bluestore rockpro64-1
> > [ceph_deploy.conf][DEBUG ] found configuration file at:
> > /home/rock64/.cephdeploy.conf
> > [ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy osd
> > create --data /dev/storage/bluestore rockpro64-1
> > [ceph_deploy.cli][INFO  ] ceph-deploy options:
> > [ceph_deploy.cli][INFO  ]  verbose   : False
> > [ceph_deploy.cli][INFO  ]  bluestore : None
> > [ceph_deploy.cli][INFO  ]  cd_conf   :
> > 
> > [ceph_deploy.cli][INFO  ]  cluster   : ceph
> > [ceph_deploy.cli][INFO  ]  fs_type   : xfs
> > [ceph_deploy.cli][INFO  ]  block_wal : None
> > [ceph_deploy.cli][INFO  ]  default_release   : False
> > [ceph_deploy.cli][INFO  ]  username  : None
> > [ceph_deploy.cli][INFO  ]  journal   : None
> > [ceph_deploy.cli][INFO  ]  subcommand: create
> > [ceph_deploy.cli][INFO  ]  host  : rockpro64-1
> > [ceph_deploy.cli][INFO  ]  filestore : None
> > [ceph_deploy.cli][INFO  ]  func  :  > osd at 0x7fa9ca0c80>
> > [ceph_deploy.cli][INFO  ]  ceph_conf : None
> > [ceph_deploy.cli][INFO  ]  zap_disk  : False
> > [ceph_deploy.cli][INFO  ]  data  :
> > /dev/storage/bluestore
> > [ceph_deploy.cli][INFO  ]  block_db  : None
> > [ceph_deploy.cli][INFO  ]  dmcrypt   : False
> > [ceph_deploy.cli][INFO  ]  overwrite_conf: False
> > [ceph_deploy.cli][INFO  ]  dmcrypt_key_dir   :
> > /etc/ceph/dmcrypt-keys
> > [ceph_deploy.cli][INFO  ]  quiet : False
> > [ceph_deploy.cli][INFO  ]  debug : False
> > [ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data
> > device /dev/storage/bluestore
> > [rockpro64-1][DEBUG ] connection detected need for sudo
> > [rockpro64-1][DEBUG ] connected to host: rockpro64-1
> > [rockpro64-1][DEBUG ] detect platform information from remote host
> > [rockpro64-1][DEBUG ] detect machine type
> > [rockpro64-1][DEBUG ] find the location of an executable
> > [ceph_deploy.osd][INFO  ] Distro info: debian buster/sid sid
> > [ceph_deploy.osd][DEBUG ] Deploying osd to rockpro64-1
> > [rockpro64-1][DEBUG ] write cluster configuration to 
> > /etc/ceph/{cluster}.conf
> > [rockpro64-1][WARNIN] osd keyring does not exist yet, creating one
> > [rockpro64-1][DEBUG ] create a keyring file
> > [rockpro64-1][DEBUG ] find the location of an executable
> > [rockpro64-1][INFO  ] Running command: sudo /usr/sbin/ceph-volume
> > --cluster ceph lvm create --bluestore --data /dev/storage/bluestore
> > [rockpro64-1][DEBUG ] Running command: /usr/bin/ceph-authtool 
> > --gen-print-key
> > [rockpro64-1][WARNIN] -->  RuntimeError: command returned non-zero
> >

Re: [ceph-users] Help Basically..

2018-09-02 Thread David Turner
The problem is with never getting a successful run of `ceph-osd
--flush-journal` on the old SSD journal drive. All of the OSDs that used
the dead journal need to be removed from the cluster, wiped, and added back
in. The data on them is not 100% consistent because the old journal died.
Any word that made it to the journal and not the disk is bad.

Add on top of that your decision to run with replica size = 2 min_size = 1,
anything that happens in your cluster becomes very dangerous for data loss.
Seeing as you had 2 nodes sure near each other, there is a very real
possibility that you will have some data loss from this.

Regardless, your first step is to remove the OSDs that were on the failed
journal. They are poison in your cluster.

On Sun, Sep 2, 2018, 10:51 AM Lee  wrote:

> I followed:
>
> $ journal_uuid=$(sudo cat /var/lib/ceph/osd/ceph-0/journal_uuid)
> $ sudo sgdisk --new=1:0:+20480M --change-name=1:'ceph journal'
> --partition-guid=1:$journal_uuid
> --typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdk
>
> Then
>
> $ sudo ceph-osd --mkjournal -i 20
> $ sudo service ceph start osd.20
>
> From 
> https://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-journal-failure/
>
> Which they all started without a problem.
>
>
> On Sun, 2 Sep 2018 at 15:43, David Turner  wrote:
>
>> It looks like osds on the first failed node are having problems. What
>> commands did you run to bring it back online?
>>
>> On Sun, Sep 2, 2018, 10:27 AM Lee  wrote:
>>
>>> Ok I have a lot in the health detail...
>>>
>>> root@node31-a4:~# ceph health detail
>>> HEALTH_ERR 64 pgs backfill; 27 pgs backfill_toofull; 39 pgs backfilling;
>>> 26 pgs degraded; 4 pgs down; 31 pgs incomplete; 1 pgs inconsistent; 12 pgs
>>> recovery_wait; 1 pgs stale; 26 pgs stuck degraded; 31 pgs stuck inactive; 1
>>> pgs stuck stale; 161 pgs stuck unclean; 9 pgs stuck undersized; 9 pgs
>>> undersized; 726 requests are blocked > 32 sec; 9 osds have slow requests;
>>> recovery 59636/5032695 objects degraded (1.185%); recovery 1280976/5032695
>>> objects misplaced (25.453%); 1 scrub errors; noscrub,nodeep-scrub flag(s)
>>> set
>>> pg 2.2a is stuck inactive for 97629.478505, current state incomplete,
>>> last acting [24,5]
>>> pg 2.b0 is stuck inactive for 98000.688979, current state incomplete,
>>> last acting [24,7]
>>> pg 9.42 is stuck inactive for 108836.103738, current state incomplete,
>>> last acting [31,12]
>>> pg 9.de is stuck inactive since forever, current state incomplete, last
>>> acting [6,5]
>>> pg 2.75 is stuck inactive since forever, current state down+incomplete,
>>> last acting [7,15]
>>> pg 9.dc is stuck inactive for 113491.800208, current state incomplete,
>>> last acting [6,7]
>>> pg 2.74 is stuck inactive for 97658.382960, current state incomplete,
>>> last acting [13,5]
>>> pg 9.1e is stuck inactive since forever, current state incomplete, last
>>> acting [7,15]
>>> pg 2.15 is stuck inactive since forever, current state incomplete, last
>>> acting [7,31]
>>> pg 11.1c is stuck inactive since forever, current state down+incomplete,
>>> last acting [6,7]
>>> pg 2.a1 is stuck inactive for 98785.26, current state incomplete,
>>> last acting [14,12]
>>> pg 9.d8 is stuck inactive for 115082.575098, current state
>>> down+incomplete, last acting [21,5]
>>> pg 9.a8 is stuck inactive for 118575.035210, current state incomplete,
>>> last acting [14,7]
>>> pg 9.78 is stuck inactive since forever, current state incomplete, last
>>> acting [5,24]
>>> pg 2.a2 is stuck inactive since forever, current state incomplete, last
>>> acting [5,13]
>>> pg 7.16 is stuck inactive since forever, current state incomplete, last
>>> acting [6,7]
>>> pg 2.13 is stuck inactive since forever, current state incomplete, last
>>> acting [7,10]
>>> pg 9.f5 is stuck inactive for 103009.439003, current state incomplete,
>>> last acting [18,5]
>>> pg 2.d is stuck inactive since forever, current state incomplete, last
>>> acting [5,10]
>>> pg 9.5 is stuck inactive since forever, current state incomplete, last
>>> acting [5,18]
>>> pg 9.3 is stuck inactive since forever, current state incomplete, last
>>> acting [7,15]
>>> pg 9.fc is stuck inactive for 201476.092908, current state incomplete,
>>> last acting [13,5]
>>> pg 11.33 is stuck inactive since forever, current state down+incomplete,
>>> last acting [7,6]
>>> pg 9.3f is stu

<    1   2   3   4   5   6   7   8   9   10   >