[ceph-users] ceph-bluestore-tool failed

2018-10-30 Thread ST Wong (ITSC)
Hi all,

We deployed a testing mimic CEPH cluster using bluestore.We can't run 
ceph-bluestore-tool on OSD with following error:

---
# ceph-bluestore-tool show-label --dev *device*
2018-10-31 09:42:01.712 7f3ac5bb4a00 -1 auth: unable to find a keyring on 
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
 (2) No such file or directory
2018-10-31 09:42:01.716 7f3ac5bb4a00 -1 monclient: authenticate NOTE: no 
keyring found; disabled cephx authentication
---

Shall we run this command on admin server with corresponding keyring ?   But 
ceph-bluestore-tool is in ceph-osd package and doesn't exist on admin server.
Did we miss anything?

Thanks a lot.
/st wong
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Using FC with LIO targets

2018-10-30 Thread Mike Christie
On 10/28/2018 03:18 AM, Frédéric Nass wrote:
> Hello Mike, Jason,
> 
> Assuming we adapt the current LIO configuration scripts and put QLogic HBAs 
> in our SCSI targets, could we use FC instead of iSCSI as a SCSI transport 
> protocol with LIO ? Would this still work with multipathing and ALUA ?
> Do you see any issues coming from this type of configuration ?

The FC drivers have a similar problem as iscsi.

The general problem is making sure the transport paths are flushed when
we failover/back. I had thought using explicit failover would fix this,
but for vpshere HA type of setups and for the single host with multiple
initiator ports to connected to the same target port we still hit issues.

For iscsi, I am working on this patchset (maybe half is now merged but
the patchset is larger due to some other requested fixes in sort of
related code):

https://www.spinics.net/lists/target-devel/msg16943.html

where from userspace we can flush the iscsi paths when performing
failover so we know there are no stale IOs in that iscsi/code path.

For FC drivers I was planning something similar where we would send a FC
echo like we do for the iscsi nop.

If you are asking if you can just drop in one of the FC target drivers
into the ceph-iscsi-config/cli/tcmu-runner stuff then it would not work,
because there are a lot of places where iscsi references are hard coded now.



> Best regards,
> Frédéric.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing MDS

2018-10-30 Thread Rhian Resnick
That is what I though. I am increasing debug to see where we are getting stuck. 
I am not sure if it is an issue deactivating or a rdlock issue.


Thanks if we discover more we will post a question with details.


Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] 



From: Patrick Donnelly 
Sent: Tuesday, October 30, 2018 8:40 PM
To: Rhian Resnick
Cc: Ceph Users
Subject: Re: [ceph-users] Removing MDS

On Tue, Oct 30, 2018 at 4:05 PM Rhian Resnick  wrote:
> We are running into issues deactivating mds ranks. Is there a way to safely 
> forcibly remove a rank?

No, there's no "safe" way to force the issue. The rank needs to come
back, flush its journal, and then complete its deactivation. To get
more help, you need to describe your environment, version of Ceph in
use, relevant log snippets, etc.

--
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing MDS

2018-10-30 Thread Patrick Donnelly
On Tue, Oct 30, 2018 at 4:05 PM Rhian Resnick  wrote:
> We are running into issues deactivating mds ranks. Is there a way to safely 
> forcibly remove a rank?

No, there's no "safe" way to force the issue. The rank needs to come
back, flush its journal, and then complete its deactivation. To get
more help, you need to describe your environment, version of Ceph in
use, relevant log snippets, etc.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Using FC with LIO targets

2018-10-30 Thread Jason Dillaman
(CCing Mike since he knows more than me)
On Sun, Oct 28, 2018 at 4:19 AM Frédéric Nass
 wrote:
>
> Hello Mike, Jason,
>
> Assuming we adapt the current LIO configuration scripts and put QLogic HBAs 
> in our SCSI targets, could we use FC instead of iSCSI as a SCSI transport 
> protocol with LIO ? Would this still work with multipathing and ALUA ?
> Do you see any issues coming from this type of configuration ?
>
> Best regards,
> Frédéric.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD: create imaged with qemu

2018-10-30 Thread Jason Dillaman
Your use of "sudo" for the rados CLI tool makes me wonder if perhaps
the "nstcc0" user cannot read "/etc/ceph/ceph.conf" or
"/etc/ceph/ceph.admin.keyring". If that's not the case, what version
of qemu-img are you using?

$ rpm -qa | grep qemu-img
qemu-img-2.11.2-4.fc28.x86_64
$ qemu-img create -f raw rbd:rbd/test 1G
Formatting 'rbd:rbd/test', fmt=raw size=1073741824

On Tue, Oct 30, 2018 at 11:56 AM Liu, Changcheng
 wrote:
>
> Hi all,
>
>  I follow below guide to create images with qemu-rbd: qemu-img create -f 
> raw rbd:quick_rbd_test/own_image 5G;
>
> http://docs.ceph.com/docs/master/rbd/qemu-rbd/
>
> However, it always shows “connect error”. Does anyone know how to resolve the 
> problem?
>
>
>
> The info is below:
>
> nstcc0@nstcloudcc0:lst_deploy$ sudo rados lspools
>
> .rgw.root
>
> default.rgw.control
>
> default.rgw.meta
>
> default.rgw.log
>
> quick_rbd_test
>
>
>
> nstcc0@nstcloudcc0:lst_deploy$ qemu-img create -f raw 
> rbd:quick_rbd_test/own_image 5G; dmesg | grep -v 'UFW' | tail -n 5
>
> Formatting 'rbd:quick_rbd_test/own_image', fmt=raw size=5368709120
>
> qemu-img: rbd:quick_rbd_test/own_image: error connecting
>
>
>
> [30696.520273] virbr0: port 1(virbr0-nic) entered blocking state
>
> [30696.520278] virbr0: port 1(virbr0-nic) entered listening state
>
> [30698.472362] virbr0: port 1(virbr0-nic) entered disabled state
>
> [30698.478503] device virbr0-nic left promiscuous mode
>
> [30698.478569] virbr0: port 1(virbr0-nic) entered disabled state
>
>
>
>
>
> nstcc0@nstcloudcc0:lst_deploy$ ifconfig
>
> eno1  Link encap:Ethernet  HWaddr 00:1e:67:94:65:ae
>
>   inet addr:10.239.48.91  Bcast:10.239.48.255  Mask:255.255.255.0
>
>   inet6 addr: fe80::651e:6989:e32c:b0b2/64 Scope:Link
>
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>
>   RX packets:1649029 errors:0 dropped:0 overruns:0 frame:0
>
>   TX packets:870237 errors:0 dropped:0 overruns:0 carrier:0
>
>   collisions:0 txqueuelen:1000
>
>   RX bytes:1492566309 (1.4 GB)  TX bytes:23775 (237.7 MB)
>
>
>
> loLink encap:Local Loopback
>
>   inet addr:127.0.0.1  Mask:255.0.0.0
>
>   inet6 addr: ::1/128 Scope:Host
>
>   UP LOOPBACK RUNNING  MTU:65536  Metric:1
>
>   RX packets:52119 errors:0 dropped:0 overruns:0 frame:0
>
>   TX packets:52119 errors:0 dropped:0 overruns:0 carrier:0
>
>   collisions:0 txqueuelen:1000
>
>   RX bytes:84885344 (84.8 MB)  TX bytes:84885344 (84.8 MB)
>
>
>
> virbr0Link encap:Ethernet  HWaddr 00:00:00:00:00:00
>
>   inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
>
>   UP BROADCAST MULTICAST  MTU:1500  Metric:1
>
>   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>
>   TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>
>   collisions:0 txqueuelen:1000
>
>   RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
>
>
>
> B.R.
>
> Changcheng
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Removing MDS

2018-10-30 Thread Rhian Resnick
Evening,


We are running into issues deactivating mds ranks. Is there a way to safely 
forcibly remove a rank?


Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Filestore to Bluestore migration question

2018-10-30 Thread Hayashida, Mami
I am relatively new to Ceph and need some advice on Bluestore migration. I
tried migrating a few of our test cluster nodes from Filestore to Bluestore
by following this (
http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/)
as the cluster is currently running 12.2.9. The cluster, originally set up
by my predecessors, was running Jewel until I upgraded it recently to
Luminous.

OSDs in each OSD host is set up in such a way that for ever 10 data HDD
disks, there is one SSD drive that is holding their journals.  For example,
osd.0 data is on /dev/sdh and its Filestore journal is on a partitioned
part of /dev/sda. So, lsblk shows something like

sda   8:00 447.1G  0 disk
├─sda18:1040G  0 part  # journal for osd.0

sdh   8:112  0   3.7T  0 disk
└─sdh18:113  0   3.7T  0 part /var/lib/ceph/osd/ceph-0

It seems like this was all set up by my predecessor with the following
command :

ceph-deploy osd create osd0:sdh:/dev/sda


Since sda is an SSD drive, even after Bluestore migration, I plan to keep
the DB and WAL for sdh (and 9 other data disks) on the /sda drive.


I followed all the way up to number 6 of
http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/.
Then, instead of ceph-volume lvm zap $DEVICE, I used from an admin node,
ceph-deploy disk zap to wipe out the contents on those two drives.  Since
osd.0 - 9 share the SSD drive for their journals, I did the same for
osd.{1..9} as well as /dev/sda.  I then destroyed osd.{0..9}  using the osd
destroy command (step 8).

Where something definitely went wrong was the last part. ceph-volume lvm
create command shown assumes that WAL and DB will be on the same device as
the data.  I tried adding --block.wal --block.data to it, but did not work.
I tried various ceph-deploy commands taken from various versions of the
doc, but nothing seemed to work.  I even tried manually creating LVs for
WAL and DB (
http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/),
but that did not work, either. At some point, the following command seems
to have worked (the only command that did not return an error), but then
all the OSDs on the node shut down and I could not bring any of them back
up.

sudo ceph-disk prepare --bluestore /dev/sdh --block.wal=/dev/sda
--block.db=/dev/sda --osd-id 0

Since this is a test cluster with essentially no data on it, I can always
start over.  But I do need to know how to properly migrate OSDs from
Filestore to Bluestore in this type of setting (with Filestore journal
residing on an SSD) for our production clusters.  Please let me know if
there are any steps missing in the documentation particularly for a case
like this and what commands I need to run to achieve what I am trying to
do.  Also, if it is advisable to upgrade to Mimic first, then perform the
FIlestore to Bluestore migration, that is an option as well.


-- 
*Mami Hayashida*

*Research Computing Associate*
Research Computing Infrastructure
University of Kentucky Information Technology Services
301 Rose Street | 102 James F. Hardymon Building
Lexington, KY 40506-0495
mami.hayash...@uky.edu
(859)323-7521
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reducing min_size on erasure coded pool may allow recovery ?

2018-10-30 Thread Chad W Seys
Thanks for the clarification!  Glad to see this feature is being pursued.

Chad.


On 10/30/2018 12:24 PM, Gregory Farnum wrote:
> On Mon, Oct 29, 2018 at 7:43 PM David Turner  > wrote:
> 
> min_size should be at least k+1 for EC. There are times to use k for
> emergencies like you had. I would suggest seeing it back to 3 once
> your back to healthy.
> 
> As far as why you needed to reduce min_size, my guess would be that
> recovery would have happened as long as k copies were up. Were the
> PG's refusing to backfill or just hang backfilled yet?
> 
> 
> Recovery on EC pools requires min_size rather than k shards at this 
> time. There were reasons; they weren't great. We're trying to get a fix 
> tested and merged at https://github.com/ceph/ceph/pull/17619
> -Greg
> 
> 
> 
> On Mon, Oct 29, 2018, 9:24 PM Chad W Seys  > wrote:
> 
> Hi all,
>     Recently our cluster lost a drive and a node (3 drives) at
> the same
> time.  Our erasure coded pools are all k2m2, so if all is working
> correctly no data is lost.
>     However, there were 4 PGs that stayed "incomplete" until I
> finally
> took the suggestion in 'ceph health detail' to reduce min_size .
> (Thanks
> for the hint!)  I'm not sure what it was (likely 3), but setting
> it to 2
> caused all PGs to become active (though degraded) and the
> cluster is on
> path to recovering fully.
> 
>     In replicated pools, would not ceph create replicas without
> the need
> to reduce min_size?  It seems odd to not recover automatically if
> possible.  Could someone explain what was going on there?
> 
>     Also, how to decide what min_size should be?
> 
> Thanks!
> Chad.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread Steve Taylor
I was having a difficult time getting debug logs from the active mgr,
but I finally got it. Apparently injecting debug_mgr doesn't work, even
when the change is reflected when you query the running config.
Modifying the config file and restarting the mgr got it to log for me.

Now that I have some debug logging, I think I may see the problem.

'ceph config-key dump'
...
"mgr/balancer/active": "1",
"mgr/balancer/max_misplaced": "1",
"mgr/balancer/mode": "upmap",
"mgr/balancer/upmap_max_deviation": "0.0001",
"mgr/balancer/upmap_max_iterations": "1000"

Mgr log excerpt:
2018-10-30 13:25:52.523117 7f08b47ff700  4 mgr[balancer] Optimize plan
upmap-balance
2018-10-30 13:25:52.523135 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/mode
2018-10-30 13:25:52.523141 7f08b47ff700 10 ceph_config_get mode found:
upmap
2018-10-30 13:25:52.523144 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/max_misplaced
2018-10-30 13:25:52.523145 7f08b47ff700 10 ceph_config_get
max_misplaced found: 1
2018-10-30 13:25:52.523178 7f08b47ff700  4 mgr[balancer] Mode upmap,
max misplaced 1.00
2018-10-30 13:25:52.523241 7f08b47ff700 20 mgr[balancer] unknown
0.00 degraded 0.00 inactive 0.00 misplaced 
0
2018-10-30 13:25:52.523288 7f08b47ff700  4 mgr[balancer] do_upmap
2018-10-30 13:25:52.523296 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_iterations
2018-10-30 13:25:52.523298 7f08b47ff700  4 ceph_config_get
upmap_max_iterations not found 
2018-10-30 13:25:52.523301 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_deviation
2018-10-30 13:25:52.523305 7f08b47ff700  4 ceph_config_get
upmap_max_deviation not found 
2018-10-30 13:25:52.523339 7f08b47ff700  4 mgr[balancer] pools ['rbd-
data']
2018-10-30 13:25:52.523350 7f08b47ff700 10 osdmap_calc_pg_upmaps osdmap
0x7f08b1884280 inc 0x7f0898bda800 max_deviation 
0.01 max_iterations 10 pools 3
2018-10-30 13:25:52.579669 7f08bbffc700  4 mgr ms_dispatch active
mgrdigest v1
2018-10-30 13:25:52.579671 7f08bbffc700  4 mgr ms_dispatch mgrdigest v1
2018-10-30 13:25:52.579673 7f08bbffc700 10 mgr handle_mgr_digest 1364
2018-10-30 13:25:52.579674 7f08bbffc700 10 mgr handle_mgr_digest 501
2018-10-30 13:25:52.579677 7f08bbffc700 10 mgr notify_all notify_all:
notify_all mon_status
2018-10-30 13:25:52.579681 7f08bbffc700 10 mgr notify_all notify_all:
notify_all health
2018-10-30 13:25:52.579683 7f08bbffc700 10 mgr notify_all notify_all:
notify_all pg_summary
2018-10-30 13:25:52.579684 7f08bbffc700 10 mgr handle_mgr_digest done.
2018-10-30 13:25:52.603867 7f08b47ff700 10 osdmap_calc_pg_upmaps r = 0
2018-10-30 13:25:52.603982 7f08b47ff700  4 mgr[balancer] prepared 0/10
changes

The mgr claims that mgr/balancer/upmap_max_iterations and
mgr/balancer/upmap_max_deviation aren't found in the config even though
they have been set and appear in the config-key dump. It seems to be
picking up the other config options correctly. Am I doing something
wrong? I feel like I must have a typo or something, but I'm not seeing
it.


 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

On Tue, 2018-10-30 at 10:11 -0600, Steve Taylor wrote:
> I had played with those settings some already, but I just tried again
> with max_deviation set to 0.0001 and max_iterations set to 1000. Same
> result. Thanks for the suggestion though.
> 
> On Tue, 2018-10-30 at 12:06 -0400, David Turner wrote:
> 
> From the balancer module's code for v 12.2.7 I noticed [1] these
> lines which reference [2] these 2 config options for upmap. You might
> try using more max iterations or a smaller max deviation to see if
> you can get a better balance in your cluster. I would try to start
> with [3] these commands/values and see if it improves your balance
> and/or allows you to generate a better map.
> 
> [1] 
> 
https://github.com/ceph/ceph/blob/v12.2.7/src/pybind/mgr/balancer/module.py#L671-L672
> [2] upmap_max_iterations (default 10)
> upmap_max_deviation (default .01)
> 
> [3] ceph config-key set mgr/balancer/upmap_max_iterations 50
> ceph config-key set mgr/balancer/upmap_max_deviation .005
> 
> On Tue, Oct 30, 2018 at 11:14 AM Steve Taylor <
> steve.tay...@storagecraft.com> wrote:
> 
> I have a Luminous 12.2.7 cluster with 2 EC pools, both using k=8
> and
> m=2. Each pool lives on 20 dedicated OSD hosts with 18 OSDs each.
> Each
> pool has 2048 PGs and is distributed across its 360 OSDs with host
> failure domains. The OSDs are identical (4TB) and are weighted with
> default weights (3.73).
> 
> Initially, and not surprisingly, the PG distribution was all over
> the
> place with PG counts per OSD ranging from 40 to 83. I 

Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread Martin Verges
Hello,

we provide a public mirror documented on
https://croit.io/2018/09/23/2018-09-23-debian-mirror for Ceph Mimic on
Debian Stretch.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


2018-10-30 17:07 GMT+01:00 Kevin Olbrich :
> Is it possible to use qemu-img with rbd support on Debian Stretch?
> I am on Luminous and try to connect my image-buildserver to load images into
> a ceph pool.
>
>> root@buildserver:~# qemu-img convert -p -O raw /target/test-vm.qcow2
>> rbd:rbd_vms_ssd_01/test_vm
>> qemu-img: Unknown protocol 'rbd'
>
>
> Kevin
>
> Am Mo., 3. Sep. 2018 um 12:07 Uhr schrieb Abhishek Lekshmanan
> :
>>
>> arad...@tma-0.net writes:
>>
>> > Can anyone confirm if the Ceph repos for Debian/Ubuntu contain packages
>> > for
>> > Debian? I'm not seeing any, but maybe I'm missing something...
>> >
>> > I'm seeing ceph-deploy install an older version of ceph on the nodes
>> > (from the
>> > Debian repo) and then failing when I run "ceph-deploy osd ..." because
>> > ceph-
>> > volume doesn't exist on the nodes.
>> >
>> The newer versions of Ceph (from mimic onwards) requires compiler
>> toolchains supporting c++17 which we unfortunately do not have for
>> stretch/jessie yet.
>>
>> -
>> Abhishek
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mds failure replaying journal

2018-10-30 Thread Jon Morby
So a big thank you to @yanzheng for his help getting this back online

The quick answer to what we did was downgrade to 13.2.1 as 13.2.2 is broken for 
cephfs

restored the backup of the journal I’d taken as part of following the disaster 
recovery process documents

turned off mds standby replay and temporarily stopped all but 2 of the mds so 
we could monitor the logs more easily

we then did a wipe sessions and watched the mds repair

Set mds_wipe_sessions to 1 and restart mds

finally there was a 

$ ceph daemon mds01 scrub_path / repair force recursive

and then setting mds_wipe_sessions back to 0

Jon


I can’t say a big enough thank you to @yanzheng for their assistance though!


> On 29 Oct 2018, at 11:13, Jon Morby (Fido)  wrote:
> 
> I've experimented and whilst the downgrade looks to be working, you end up 
> with errors regarding unsupported feature "mimic" amongst others
> 
> 2018-10-29 10:51:20.652047 7f6f1b9f5080 -1 ERROR: on disk data includes 
> unsupported features: compat={},rocompat={},incompat={10=mimic ondisk layou
> 
> so I gave up on that idea
> 
> In addition to the cephfs volume (which is basically just mirrors and some 
> backups) we have a large rbd deployment using the same ceph cluster, and if 
> we lose that we're screwed ... the cephfs volume was more an "experiment" to 
> see how viable it would be as an NFS replacement
> 
> There's 26TB of data on there, so I'd rather not have to go off and 
> redownload it all .. but losing it isn't the end of the world (but it will 
> piss off a few friends)
> 
> Jon
> 
> 
> - On 29 Oct, 2018, at 09:54, Zheng Yan  wrote:
> 
> 
> On Mon, Oct 29, 2018 at 5:25 PM Jon Morby (Fido)  > wrote:
> Hi
> 
> Ideally we'd like to undo the whole accidental upgrade to 13.x and ensure 
> that ceph-deploy doesn't do another major release upgrade without a lot of 
> warnings
> 
> Either way, I'm currently getting errors that 13.2.1 isn't available / shaman 
> is offline / etc
> 
> What's the best / recommended way of doing this downgrade across our estate?
> 
> 
> You have already upgraded ceph-mon. I don't know If it can be safely 
> downgraded (If I remember right, I corrupted monitor's data when downgrading 
> ceph-mon from minic to luminous). 
>  
> 
> 
> - On 29 Oct, 2018, at 08:19, Yan, Zheng  > wrote:
> 
> We backported a wrong patch to 13.2.2.  downgrade ceph to 13.2.1, then run 
> 'ceph mds repaired fido_fs:1" .
> Sorry for the trouble
> Yan, Zheng
> 
> On Mon, Oct 29, 2018 at 7:48 AM Jon Morby  > wrote:
> 
> We accidentally found ourselves upgraded from 12.2.8 to 13.2.2 after a 
> ceph-deploy install went awry (we were expecting it to upgrade to 12.2.9 and 
> not jump a major release without warning)
> 
> Anyway .. as a result, we ended up with an mds journal error and 1 daemon 
> reporting as damaged
> 
> Having got nowhere trying to ask for help on irc, we've followed various 
> forum posts and disaster recovery guides, we ended up resetting the journal 
> which left the daemon as no longer “damaged” however we’re now seeing mds 
> segfault whilst trying to replay 
> 
> https://pastebin.com/iSLdvu0b 
> 
> 
> 
> /build/ceph-13.2.2/src/mds/journal.cc : 1572: FAILED 
> assert(g_conf->mds_wipe_sessions)
> 
>  ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x102) [0x7fad637f70f2]
>  2: (()+0x3162b7) [0x7fad637f72b7]
>  3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) 
> [0x7a7a6b]
>  4: (EUpdate::replay(MDSRank*)+0x39) [0x7a8fa9]
>  5: (MDLog::_replay_thread()+0x864) [0x752164]
>  6: (MDLog::ReplayThread::entry()+0xd) [0x4f021d]
>  7: (()+0x76ba) [0x7fad6305a6ba]
>  8: (clone()+0x6d) [0x7fad6288341d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
> 
> 
> full logs
> 
> https://pastebin.com/X5UG9vT2 
> 
> We’ve been unable to access the cephfs file system since all of this started 
> …. attempts to mount fail with reports that “mds probably not available” 
> 
> Oct 28 23:47:02 mirrors kernel: [115602.911193] ceph: probably no mds server 
> is up
> 
> 
> root@mds02:~# ceph -s
>   cluster:
> id: 78d5bf7d-b074-47ab-8d73-bd4d99df98a5
> health: HEALTH_WARN
> 1 filesystem is degraded
> insufficient standby MDS daemons available
> too many PGs per OSD (276 > max 250)
> 
>   services:
> mon: 3 daemons, quorum mon01,mon02,mon03
> mgr: mon01(active), standbys: mon02, mon03
> mds: fido_fs-2/2/1 up  {0=mds01=up:resolve,1=mds02=up:replay(laggy or 
> crashed)}
> osd: 27 osds: 27 up, 27 in
> 
>   data:
> pools:   15 pools, 3168 pgs
> objects: 16.97 M objects, 30 TiB
> usage:   71 TiB used, 27 TiB / 98 TiB avail
> pgs: 3168 active+clean
> 
>   io:
> 

Re: [ceph-users] node not using cluster subnet

2018-10-30 Thread Steven Vacaroaia
Thanks for taking the trouble to provide advice
I found that the Juniper switch port for the server that did not work did
not have the MTU changed to 9200
I am using MTU 9000 for the cluster network

Not sure why packet fragmentation created issues but ...all seems fine now

Thanks
Steven

I


On Tue, 30 Oct 2018 at 13:22, Gregory Farnum  wrote:

> The OSDs ping each other on both public and cluster networks. Perhaps the
> routing isn't working on the public network? Or maybe it's trying to ping
> from the cluster 192. network into the public 10. network and that isn't
> getting through?
> -Greg
>
> On Tue, Oct 30, 2018 at 8:34 AM Steven Vacaroaia  wrote:
>
>> Hi,
>> I am trying to add another node to my cluster which is configured to use
>> a dedicated subnet
>>
>> public_network = 10.10.35.0/24
>> cluster_network = 192.168.200.0/24
>>
>> For whatever reason, this node is staring properly and few seconds later
>> is failing
>> and staring to check for connectivity on public network
>>
>> The other 3 nodes are working fine
>> Nodes are identical
>>
>> Using kernel 4.18 and Mimic 13.2.2
>>
>> No firewall is involved
>>
>> I am really puzzled by this - any suggestions will be appreciated
>>
>> I have purged and reinstalled - also make sure I can ping using cluster
>> network
>>
>> 2018-10-30 11:09:28.344 7f274b537700  1 osd.3 308 state: booting -> active
>> 2018-10-30 11:09:29.621 7f275b848700  0 -- 192.168.200.204:6800/18679 >>
>> 192.168.200.201:6802/5008172 conn(0x557ed0318600 :6800
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
>> challenging authorizer
>> 2018-10-30 11:09:29.621 7f275b047700  0 -- 192.168.200.204:6800/18679 >>
>> 192.168.200.203:6800/6002192 conn(0x557ed0318c00 :6800
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
>> challenging authorizer
>> 2018-10-30 11:09:29.621 7f275b848700  0 -- 192.168.200.204:6800/18679 >>
>> 192.168.200.201:6802/5008172 conn(0x557ed0318000 :-1
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
>> challenging authorizer
>> 2018-10-30 11:09:29.621 7f275b047700  0 -- 192.168.200.204:6800/18679 >>
>> 192.168.200.203:6800/6002192 conn(0x557ed0319800 :-1
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
>> challenging authorizer
>> 2018-10-30 11:09:49.923 7f2756d4e700 -1 osd.3 308 heartbeat_check: no
>> reply from 10.10.35.201:6802 osd.0 ever on either front or back, first
>> ping sent 2018-10-30 11:09:29.621624 (cutoff 2018-10-30 11:09:29.924534)
>> 2018-10-30 11:09:49.923 7f2756d4e700 -1 osd.3 308 heartbeat_check: no
>> reply from 10.10.35.202:6802 osd.1 ever on either front or back, first
>> ping sent 2018-10-30 11:09:29.621624 (cutoff 2018-10-30 11:09:29.924534)
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reducing min_size on erasure coded pool may allow recovery ?

2018-10-30 Thread Gregory Farnum
On Mon, Oct 29, 2018 at 7:43 PM David Turner  wrote:

> min_size should be at least k+1 for EC. There are times to use k for
> emergencies like you had. I would suggest seeing it back to 3 once your
> back to healthy.
>
> As far as why you needed to reduce min_size, my guess would be that
> recovery would have happened as long as k copies were up. Were the PG's
> refusing to backfill or just hang backfilled yet?
>

Recovery on EC pools requires min_size rather than k shards at this time.
There were reasons; they weren't great. We're trying to get a fix tested
and merged at https://github.com/ceph/ceph/pull/17619
-Greg


>
>
> On Mon, Oct 29, 2018, 9:24 PM Chad W Seys  wrote:
>
>> Hi all,
>>Recently our cluster lost a drive and a node (3 drives) at the same
>> time.  Our erasure coded pools are all k2m2, so if all is working
>> correctly no data is lost.
>>However, there were 4 PGs that stayed "incomplete" until I finally
>> took the suggestion in 'ceph health detail' to reduce min_size . (Thanks
>> for the hint!)  I'm not sure what it was (likely 3), but setting it to 2
>> caused all PGs to become active (though degraded) and the cluster is on
>> path to recovering fully.
>>
>>In replicated pools, would not ceph create replicas without the need
>> to reduce min_size?  It seems odd to not recover automatically if
>> possible.  Could someone explain what was going on there?
>>
>>Also, how to decide what min_size should be?
>>
>> Thanks!
>> Chad.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] node not using cluster subnet

2018-10-30 Thread Gregory Farnum
The OSDs ping each other on both public and cluster networks. Perhaps the
routing isn't working on the public network? Or maybe it's trying to ping
from the cluster 192. network into the public 10. network and that isn't
getting through?
-Greg

On Tue, Oct 30, 2018 at 8:34 AM Steven Vacaroaia  wrote:

> Hi,
> I am trying to add another node to my cluster which is configured to use
> a dedicated subnet
>
> public_network = 10.10.35.0/24
> cluster_network = 192.168.200.0/24
>
> For whatever reason, this node is staring properly and few seconds later
> is failing
> and staring to check for connectivity on public network
>
> The other 3 nodes are working fine
> Nodes are identical
>
> Using kernel 4.18 and Mimic 13.2.2
>
> No firewall is involved
>
> I am really puzzled by this - any suggestions will be appreciated
>
> I have purged and reinstalled - also make sure I can ping using cluster
> network
>
> 2018-10-30 11:09:28.344 7f274b537700  1 osd.3 308 state: booting -> active
> 2018-10-30 11:09:29.621 7f275b848700  0 -- 192.168.200.204:6800/18679 >>
> 192.168.200.201:6802/5008172 conn(0x557ed0318600 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
> challenging authorizer
> 2018-10-30 11:09:29.621 7f275b047700  0 -- 192.168.200.204:6800/18679 >>
> 192.168.200.203:6800/6002192 conn(0x557ed0318c00 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
> challenging authorizer
> 2018-10-30 11:09:29.621 7f275b848700  0 -- 192.168.200.204:6800/18679 >>
> 192.168.200.201:6802/5008172 conn(0x557ed0318000 :-1
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
> challenging authorizer
> 2018-10-30 11:09:29.621 7f275b047700  0 -- 192.168.200.204:6800/18679 >>
> 192.168.200.203:6800/6002192 conn(0x557ed0319800 :-1
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
> challenging authorizer
> 2018-10-30 11:09:49.923 7f2756d4e700 -1 osd.3 308 heartbeat_check: no
> reply from 10.10.35.201:6802 osd.0 ever on either front or back, first
> ping sent 2018-10-30 11:09:29.621624 (cutoff 2018-10-30 11:09:29.924534)
> 2018-10-30 11:09:49.923 7f2756d4e700 -1 osd.3 308 heartbeat_check: no
> reply from 10.10.35.202:6802 osd.1 ever on either front or back, first
> ping sent 2018-10-30 11:09:29.621624 (cutoff 2018-10-30 11:09:29.924534)
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread Kevin Olbrich
Hi!

Proxmox has support for rbd as they ship additional packages as well as
ceph via their own repo.

I ran your command and got this:

> qemu-img version 2.8.1(Debian 1:2.8+dfsg-6+deb9u4)
> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
> Supported formats: blkdebug blkreplay blkverify bochs cloop dmg file ftp
> ftps gluster host_cdrom host_device http https iscsi iser luks nbd nfs
> null-aio null-co parallels qcow qcow2 qed quorum raw rbd replication
> sheepdog ssh vdi vhdx vmdk vpc vvfat


It lists rbd but still fails with the exact same error.

Kevin


Am Di., 30. Okt. 2018 um 17:14 Uhr schrieb David Turner <
drakonst...@gmail.com>:

> What version of qemu-img are you using?  I found [1] this when poking
> around on my qemu server when checking for rbd support.  This version (note
> it's proxmox) has rbd listed as a supported format.
>
> [1]
> # qemu-img -V; qemu-img --help|grep rbd
> qemu-img version 2.11.2pve-qemu-kvm_2.11.2-1
> Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
> Supported formats: blkdebug blkreplay blkverify bochs cloop dmg file ftp
> ftps gluster host_cdrom host_device http https iscsi iser luks nbd null-aio
> null-co parallels qcow qcow2 qed quorum raw rbd replication sheepdog
> throttle vdi vhdx vmdk vpc vvfat zeroinit
> On Tue, Oct 30, 2018 at 12:08 PM Kevin Olbrich  wrote:
>
>> Is it possible to use qemu-img with rbd support on Debian Stretch?
>> I am on Luminous and try to connect my image-buildserver to load images
>> into a ceph pool.
>>
>> root@buildserver:~# qemu-img convert -p -O raw /target/test-vm.qcow2
>>> rbd:rbd_vms_ssd_01/test_vm
>>> qemu-img: Unknown protocol 'rbd'
>>
>>
>> Kevin
>>
>> Am Mo., 3. Sep. 2018 um 12:07 Uhr schrieb Abhishek Lekshmanan <
>> abhis...@suse.com>:
>>
>>> arad...@tma-0.net writes:
>>>
>>> > Can anyone confirm if the Ceph repos for Debian/Ubuntu contain
>>> packages for
>>> > Debian? I'm not seeing any, but maybe I'm missing something...
>>> >
>>> > I'm seeing ceph-deploy install an older version of ceph on the nodes
>>> (from the
>>> > Debian repo) and then failing when I run "ceph-deploy osd ..." because
>>> ceph-
>>> > volume doesn't exist on the nodes.
>>> >
>>> The newer versions of Ceph (from mimic onwards) requires compiler
>>> toolchains supporting c++17 which we unfortunately do not have for
>>> stretch/jessie yet.
>>>
>>> -
>>> Abhishek
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread David Turner
What version of qemu-img are you using?  I found [1] this when poking
around on my qemu server when checking for rbd support.  This version (note
it's proxmox) has rbd listed as a supported format.

[1]
# qemu-img -V; qemu-img --help|grep rbd
qemu-img version 2.11.2pve-qemu-kvm_2.11.2-1
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
Supported formats: blkdebug blkreplay blkverify bochs cloop dmg file ftp
ftps gluster host_cdrom host_device http https iscsi iser luks nbd null-aio
null-co parallels qcow qcow2 qed quorum raw rbd replication sheepdog
throttle vdi vhdx vmdk vpc vvfat zeroinit
On Tue, Oct 30, 2018 at 12:08 PM Kevin Olbrich  wrote:

> Is it possible to use qemu-img with rbd support on Debian Stretch?
> I am on Luminous and try to connect my image-buildserver to load images
> into a ceph pool.
>
> root@buildserver:~# qemu-img convert -p -O raw /target/test-vm.qcow2
>> rbd:rbd_vms_ssd_01/test_vm
>> qemu-img: Unknown protocol 'rbd'
>
>
> Kevin
>
> Am Mo., 3. Sep. 2018 um 12:07 Uhr schrieb Abhishek Lekshmanan <
> abhis...@suse.com>:
>
>> arad...@tma-0.net writes:
>>
>> > Can anyone confirm if the Ceph repos for Debian/Ubuntu contain packages
>> for
>> > Debian? I'm not seeing any, but maybe I'm missing something...
>> >
>> > I'm seeing ceph-deploy install an older version of ceph on the nodes
>> (from the
>> > Debian repo) and then failing when I run "ceph-deploy osd ..." because
>> ceph-
>> > volume doesn't exist on the nodes.
>> >
>> The newer versions of Ceph (from mimic onwards) requires compiler
>> toolchains supporting c++17 which we unfortunately do not have for
>> stretch/jessie yet.
>>
>> -
>> Abhishek
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread Steve Taylor
I had played with those settings some already, but I just tried again
with max_deviation set to 0.0001 and max_iterations set to 1000. Same
result. Thanks for the suggestion though.


 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

On Tue, 2018-10-30 at 12:06 -0400, David Turner wrote:
> From the balancer module's code for v 12.2.7 I noticed [1] these
> lines which reference [2] these 2 config options for upmap. You might
> try using more max iterations or a smaller max deviation to see if
> you can get a better balance in your cluster. I would try to start
> with [3] these commands/values and see if it improves your balance
> and/or allows you to generate a better map.
> 
> [1] 
> https://github.com/ceph/ceph/blob/v12.2.7/src/pybind/mgr/balancer/module.py#L671-L672
> [2] upmap_max_iterations (default 10)
> upmap_max_deviation (default .01)
> 
> [3] ceph config-key set mgr/balancer/upmap_max_iterations 50
> ceph config-key set mgr/balancer/upmap_max_deviation .005
> 
> On Tue, Oct 30, 2018 at 11:14 AM Steve Taylor <
> steve.tay...@storagecraft.com> wrote:
> > I have a Luminous 12.2.7 cluster with 2 EC pools, both using k=8
> > and
> > m=2. Each pool lives on 20 dedicated OSD hosts with 18 OSDs each.
> > Each
> > pool has 2048 PGs and is distributed across its 360 OSDs with host
> > failure domains. The OSDs are identical (4TB) and are weighted with
> > default weights (3.73).
> > 
> > Initially, and not surprisingly, the PG distribution was all over
> > the
> > place with PG counts per OSD ranging from 40 to 83. I enabled the
> > balancer module in upmap mode and let it work its magic, which
> > reduced
> > the range of the per-OSD PG counts to 56-61.
> > 
> > While 56-61 is obviously a whole lot better than 40-83, with upmap
> > I
> > expected the range to be 56-57. If I run 'ceph balancer optimize
> > ' again to attempt to create a new plan I get 'Error
> > EALREADY:
> > Unable to find further optimization,or distribution is already
> > perfect.' I set the balancer's max_misplaced value to 1 in case
> > that
> > was preventing further optimization, but I still get the same
> > error.
> > 
> > I'm sure I'm missing some config option or something that will
> > allow it
> > to do better, but thus far I haven't been able to find anything in
> > the
> > docs, mailing list archives, or balancer source code that helps.
> > Any
> > ideas?
> > 
> > 
> > Steve Taylor | Senior Software Engineer | StorageCraft Technology
> > Corporation
> > 380 Data Drive Suite 300 | Draper | Utah | 84020
> > Office: 801.871.2799 | 
> > 
> > If you are not the intended recipient of this message or received
> > it erroneously, please notify the sender and delete it, together
> > with any attachments, and be advised that any dissemination or
> > copying of this message is prohibited.
> > 
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread Kevin Olbrich
Is it possible to use qemu-img with rbd support on Debian Stretch?
I am on Luminous and try to connect my image-buildserver to load images
into a ceph pool.

root@buildserver:~# qemu-img convert -p -O raw /target/test-vm.qcow2
> rbd:rbd_vms_ssd_01/test_vm
> qemu-img: Unknown protocol 'rbd'


Kevin

Am Mo., 3. Sep. 2018 um 12:07 Uhr schrieb Abhishek Lekshmanan <
abhis...@suse.com>:

> arad...@tma-0.net writes:
>
> > Can anyone confirm if the Ceph repos for Debian/Ubuntu contain packages
> for
> > Debian? I'm not seeing any, but maybe I'm missing something...
> >
> > I'm seeing ceph-deploy install an older version of ceph on the nodes
> (from the
> > Debian repo) and then failing when I run "ceph-deploy osd ..." because
> ceph-
> > volume doesn't exist on the nodes.
> >
> The newer versions of Ceph (from mimic onwards) requires compiler
> toolchains supporting c++17 which we unfortunately do not have for
> stretch/jessie yet.
>
> -
> Abhishek
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread David Turner
>From the balancer module's code for v 12.2.7 I noticed [1] these lines
which reference [2] these 2 config options for upmap. You might try using
more max iterations or a smaller max deviation to see if you can get a
better balance in your cluster. I would try to start with [3] these
commands/values and see if it improves your balance and/or allows you to
generate a better map.

[1]
https://github.com/ceph/ceph/blob/v12.2.7/src/pybind/mgr/balancer/module.py#L671-L672
[2] upmap_max_iterations (default 10)
upmap_max_deviation (default .01)
[3] ceph config-key set mgr/balancer/upmap_max_iterations 50
ceph config-key set mgr/balancer/upmap_max_deviation .005

On Tue, Oct 30, 2018 at 11:14 AM Steve Taylor 
wrote:

> I have a Luminous 12.2.7 cluster with 2 EC pools, both using k=8 and
> m=2. Each pool lives on 20 dedicated OSD hosts with 18 OSDs each. Each
> pool has 2048 PGs and is distributed across its 360 OSDs with host
> failure domains. The OSDs are identical (4TB) and are weighted with
> default weights (3.73).
>
> Initially, and not surprisingly, the PG distribution was all over the
> place with PG counts per OSD ranging from 40 to 83. I enabled the
> balancer module in upmap mode and let it work its magic, which reduced
> the range of the per-OSD PG counts to 56-61.
>
> While 56-61 is obviously a whole lot better than 40-83, with upmap I
> expected the range to be 56-57. If I run 'ceph balancer optimize
> ' again to attempt to create a new plan I get 'Error EALREADY:
> Unable to find further optimization,or distribution is already
> perfect.' I set the balancer's max_misplaced value to 1 in case that
> was preventing further optimization, but I still get the same error.
>
> I'm sure I'm missing some config option or something that will allow it
> to do better, but thus far I haven't been able to find anything in the
> docs, mailing list archives, or balancer source code that helps. Any
> ideas?
>
>
> Steve Taylor | Senior Software Engineer | StorageCraft Technology
> Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 <(801)%20871-2799> |
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD: create imaged with qemu

2018-10-30 Thread Liu, Changcheng
Hi all,
 I follow below guide to create images with qemu-rbd: qemu-img create -f 
raw rbd:quick_rbd_test/own_image 5G;
http://docs.ceph.com/docs/master/rbd/qemu-rbd/
However, it always shows "connect error". Does anyone know how to resolve the 
problem?

The info is below:
nstcc0@nstcloudcc0:lst_deploy$ sudo rados lspools
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
quick_rbd_test

nstcc0@nstcloudcc0:lst_deploy$ qemu-img create -f raw 
rbd:quick_rbd_test/own_image 5G; dmesg | grep -v 'UFW' | tail -n 5
Formatting 'rbd:quick_rbd_test/own_image', fmt=raw size=5368709120
qemu-img: rbd:quick_rbd_test/own_image: error connecting

[30696.520273] virbr0: port 1(virbr0-nic) entered blocking state
[30696.520278] virbr0: port 1(virbr0-nic) entered listening state
[30698.472362] virbr0: port 1(virbr0-nic) entered disabled state
[30698.478503] device virbr0-nic left promiscuous mode
[30698.478569] virbr0: port 1(virbr0-nic) entered disabled state


nstcc0@nstcloudcc0:lst_deploy$ ifconfig
eno1  Link encap:Ethernet  HWaddr 00:1e:67:94:65:ae
  inet addr:10.239.48.91  Bcast:10.239.48.255  Mask:255.255.255.0
  inet6 addr: fe80::651e:6989:e32c:b0b2/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1649029 errors:0 dropped:0 overruns:0 frame:0
  TX packets:870237 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:1492566309 (1.4 GB)  TX bytes:23775 (237.7 MB)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:65536  Metric:1
  RX packets:52119 errors:0 dropped:0 overruns:0 frame:0
  TX packets:52119 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:84885344 (84.8 MB)  TX bytes:84885344 (84.8 MB)

virbr0Link encap:Ethernet  HWaddr 00:00:00:00:00:00
  inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
  UP BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

B.R.
Changcheng
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] node not using cluster subnet

2018-10-30 Thread Steven Vacaroaia
Hi,
I am trying to add another node to my cluster which is configured to use  a
dedicated subnet

public_network = 10.10.35.0/24
cluster_network = 192.168.200.0/24

For whatever reason, this node is staring properly and few seconds later is
failing
and staring to check for connectivity on public network

The other 3 nodes are working fine
Nodes are identical

Using kernel 4.18 and Mimic 13.2.2

No firewall is involved

I am really puzzled by this - any suggestions will be appreciated

I have purged and reinstalled - also make sure I can ping using cluster
network

2018-10-30 11:09:28.344 7f274b537700  1 osd.3 308 state: booting -> active
2018-10-30 11:09:29.621 7f275b848700  0 -- 192.168.200.204:6800/18679 >>
192.168.200.201:6802/5008172 conn(0x557ed0318600 :6800
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
challenging authorizer
2018-10-30 11:09:29.621 7f275b047700  0 -- 192.168.200.204:6800/18679 >>
192.168.200.203:6800/6002192 conn(0x557ed0318c00 :6800
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
challenging authorizer
2018-10-30 11:09:29.621 7f275b848700  0 -- 192.168.200.204:6800/18679 >>
192.168.200.201:6802/5008172 conn(0x557ed0318000 :-1
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
challenging authorizer
2018-10-30 11:09:29.621 7f275b047700  0 -- 192.168.200.204:6800/18679 >>
192.168.200.203:6800/6002192 conn(0x557ed0319800 :-1
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg:
challenging authorizer
2018-10-30 11:09:49.923 7f2756d4e700 -1 osd.3 308 heartbeat_check: no reply
from 10.10.35.201:6802 osd.0 ever on either front or back, first ping sent
2018-10-30 11:09:29.621624 (cutoff 2018-10-30 11:09:29.924534)
2018-10-30 11:09:49.923 7f2756d4e700 -1 osd.3 308 heartbeat_check: no reply
from 10.10.35.202:6802 osd.1 ever on either front or back, first ping sent
2018-10-30 11:09:29.621624 (cutoff 2018-10-30 11:09:29.924534)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New us-central mirror request

2018-10-30 Thread Zachary Muller
Hi all,

We are GigeNET, a datacenter based in Arlington Heights, IL (close to
Chicago). We are starting to mirror ceph and would like to become and
official mirror. We meet all of the requirements and have 2x bonded 1Gbps
NICs.

http://mirrors.gigenet.com/ceph/

Regards,

Zachary Muller
Systems Administrator
GigeNET.com | GigeNETCloud.com | DDoSProtection.com
800-561-2656 x 8119
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread Steve Taylor
I have a Luminous 12.2.7 cluster with 2 EC pools, both using k=8 and
m=2. Each pool lives on 20 dedicated OSD hosts with 18 OSDs each. Each
pool has 2048 PGs and is distributed across its 360 OSDs with host
failure domains. The OSDs are identical (4TB) and are weighted with
default weights (3.73).

Initially, and not surprisingly, the PG distribution was all over the
place with PG counts per OSD ranging from 40 to 83. I enabled the
balancer module in upmap mode and let it work its magic, which reduced
the range of the per-OSD PG counts to 56-61.

While 56-61 is obviously a whole lot better than 40-83, with upmap I
expected the range to be 56-57. If I run 'ceph balancer optimize
' again to attempt to create a new plan I get 'Error EALREADY:
Unable to find further optimization,or distribution is already
perfect.' I set the balancer's max_misplaced value to 1 in case that
was preventing further optimization, but I still get the same error.

I'm sure I'm missing some config option or something that will allow it
to do better, but thus far I haven't been able to find anything in the
docs, mailing list archives, or balancer source code that helps. Any
ideas?

 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large omap objects - how to fix ?

2018-10-30 Thread Tomasz Płaza

Hi hijackers,

Please read: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030317.html


TL;DR: Ceph should reshard big indexes, but after that it leaves them to 
be removed manually. Starting from some version, deep-scrub reports 
indexes above some threshold as HALTH_WARN. You should find it in osd 
logs. If You do not have logs, just listomapkeys on every object in 
default.rgw.buckets.index and find the biggest ones... it should be safe 
to remove those (radosgw-admin bi purge) but I can not guarantee it.



On 26.10.2018 at 17:18, Florian Engelmann wrote:

Hi,

hijacking the hijacker! Sorry!

radosgw-admin bucket reshard --bucket somebucket --num-shards 8
*** NOTICE: operation will not remove old bucket index objects ***
*** these will need to be removed manually ***
tenant:
bucket name: somebucket
old bucket instance id: cb1594b3-a782-49d0-a19f-68cd48870a63.1923153.1
new bucket instance id: cb1594b3-a782-49d0-a19f-68cd48870a63.3119759.1
total entries: 1000 2000 3000 4000 5000 6000 7000 8000 9000 1 
11000 12000 13000 14000 15000 16000 17000 18000 19000 2 21000 
22000 23000 24000 25000 26000 27000 28000 29000 3 31000 32000 
33000 34000 35000 36000 37000 38000 39000 4 41000 42000 43000 
44000 45000 46000 47000 48000 49000 5 51000 52000 53000 54000 
55000 56000 57000 58000 59000 6 61000 62000 63000 64000 65000 
66000 67000 68000 69000 7 71000 72000 73000 74000 75000 76000 
77000 78000 79000 8 81000 82000 83000 84000 85000 86000 87000 
88000 89000 9 91000 92000 93000 94000 95000 96000 97000 98000 
99000 10 101000 102000 103000 104000 105000 106000 107000 108000 
109000 11 111000 112000 113000 114000 115000 116000 117000 118000 
119000 12 121000 122000 123000 124000 125000 126000 127000 128000 
129000 13 131000 132000 133000 134000 135000 136000 137000 138000 
139000 14 141000 142000 143000 144000 145000 146000 147000 148000 
149000 15 151000 152000 153000 154000 155000 156000 157000 158000 
159000 16 161000 162000 163000 164000 165000 166000 167000 168000 
169000 17 171000 172000 173000 174000 175000 176000 177000 178000 
179000 18 181000 182000 183000 184000 185000 186000 187000 188000 
189000 19 191000 192000 193000 194000 195000 196000 197000 198000 
199000 20 201000 202000 203000 204000 205000 206000 207000 207660


What to do now?

ceph -s is still:

    health: HEALTH_WARN
    1 large omap objects

But I have no idea how to:
*** NOTICE: operation will not remove old bucket index objects ***
*** these will need to be removed manually ***


All the best,
Flo


Am 10/26/18 um 3:56 PM schrieb Alexandru Cucu:

Hi,

Sorry to hijack this thread. I have a similar issue also with 12.2.8
recently upgraded from Jewel.

I my case all buckets are within limits:
 # radosgw-admin bucket limit check | jq 
'.[].buckets[].fill_status' | uniq

 "OK"

 # radosgw-admin bucket limit check | jq
'.[].buckets[].objects_per_shard'  | sort -n | uniq
 0
 1
 30
 109
 516
 5174
 50081
 50088
 50285
 50323
 50336
 51826

rgw_max_objs_per_shard is set to the default of 100k

---
Alex Cucu

On Fri, Oct 26, 2018 at 4:09 PM Ben Morrice  wrote:


Hello all,

After a recent Luminous upgrade (now running 12.2.8 with all OSDs
migrated to bluestore, upgraded from 11.2.0 and running filestore) I am
currently experiencing the warning 'large omap objects'.
I know this is related to large buckets in radosgw, and luminous
supports 'dynamic sharding' - however I feel that something is missing
from our configuration and i'm a bit confused on what the right 
approach

is to fix it.

First a bit of background info:

We previously had a multi site radosgw installation, however 
recently we

decommissioned the second site. With the radosgw multi-site
configuration we had 'bucket_index_max_shards = 0'. Since
decommissioning the second site, I have removed the secondary zonegroup
and changed 'bucket_index_max_shards' to be 16 for the single 
primary zone.

All our buckets do not have a 'num_shards' field when running
'radosgw-admin bucket stats --bucket '
Is this normal ?

Also - I'm finding it difficult to find out exactly what to do with the
buckets that are affected with 'large omap' (see commands below).
My interpretation of 'search the cluster log' is also listed below.

What do I need to do to with the below buckets get back to an overall
ceph HEALTH OK state ? :)


# ceph health detail
HEALTH_WARN 2 large omap objects
2 large objects found in pool '.bbp-gva-master.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.

# ceph osd pool get .bbp-gva-master.rgw.buckets.index pg_num
pg_num: 64

# for i in `ceph pg ls-by-pool .bbp-gva-master.rgw.buckets.index | tail
-n +2 | awk '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep
num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1"

Re: [ceph-users] OSD node reinstallation

2018-10-30 Thread David Turner
Basically it's a good idea to backup your /etc/ceph/ folder to reinstall
the node. Most everything you need will be in there for your osds.

On Tue, Oct 30, 2018, 6:01 AM Luiz Gustavo Tonello <
gustavo.tone...@gmail.com> wrote:

> Thank you guys,
>
> It'll save me a bunch of time, because the process to reallocate OSD files
> is not so fast. :-)
>
>
>
> On Tue, Oct 30, 2018 at 6:15 AM Alexandru Cucu  wrote:
>
>> Don't forget about the cephx keyring if you are using cephx ;)
>>
>> Usually sits in:
>> /var/lib/ceph/bootstrap-osd/ceph.keyring
>>
>> ---
>> Alex
>>
>> On Tue, Oct 30, 2018 at 4:48 AM David Turner 
>> wrote:
>> >
>> > Set noout, reinstall the OS without going the OSDs (including any
>> journal partitions and maintaining any dmcrypt keys if you have
>> encryption), install ceph, make sure the ceph.conf file is correct,zip
>> start OSDs, unset noout once they're back up and in. All of the data the
>> OSD needs to start is on the OSD itself.
>> >
>> > On Mon, Oct 29, 2018, 6:52 PM Luiz Gustavo Tonello <
>> gustavo.tone...@gmail.com> wrote:
>> >>
>> >> Hi list,
>> >>
>> >> I have a situation that I need to reinstall the O.S. of a single node
>> in my OSD cluster.
>> >> This node has 4 OSDs configured, each one has ~4 TB used.
>> >>
>> >> The way that I'm thinking to proceed is to put OSD down (one each
>> time), stop the OSD, reinstall the O.S., and finally add the OSDs again.
>> >>
>> >> But I want to know if there's a way to do this in a more simple
>> process, maybe put OSD in maintenance (noout), reinstall the O.S. without
>> formatting my Storage volumes, install CEPH again and enable OSDs again.
>> >>
>> >> There's a way like these?
>> >>
>> >> I'm running CEPH Jewel.
>> >>
>> >> Best,
>> >> --
>> >> Luiz Gustavo P Tonello.
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> Luiz Gustavo P Tonello.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] is it right involving cap->session_caps without lock protection in the two functions ?

2018-10-30 Thread Yan, Zheng


> On Oct 30, 2018, at 18:10, ? ?  wrote:
> 
> Hello:
>  Recently, we have encountered a kernel dead question, and the reason 
> we analyses vmcore dmesg is that list_add_tail(>session_caps) in
> __ceph_remove_cap has wrong,since >session_cap is NULL!
> so we analyses codes with cap->session_caps operaions,
> We found these two functions involving cap - > session_cap operations, but 
> there is no lock protection.
> (1) cleanup_cap_releases
> (2) ceph_send_cap_releases
> SO, we want to ask you,whether there is a cap > session_caps operation 
> exception in multithreading without locking or is it right code to operate 
> cap->session_caps in these two functions  without lock protection? and why if 
> right?


There are protected by s_cap_lock. Which version of kernel do you use?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reducing Max_mds

2018-10-30 Thread Rhian Resnick
John,


Thanks!


Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] 



From: John Spray 
Sent: Tuesday, October 30, 2018 5:26 AM
To: Rhian Resnick
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Reducing Max_mds

On Tue, Oct 30, 2018 at 6:36 AM Rhian Resnick  wrote:
>
> Evening,
>
>
> I am looking to decrease our max mds servers as we had a server failure and 
> need to remove a node.
>
>
> When we attempt to decrease the number of mds servers from 5 to 4 (or any 
> other number) they never transition to standby. They just stay active.
>
>
> ceph fs set cephfs max_mds X

After you decrease max_mds, use "ceph mds deactivate " to bring
the actual number of active daemons in line with your new intended
maximum.

>From Ceph 13.x that happens automatically, but since you're on 12.x it
needs doing by hand.

John

>
> Nothing looks useful in the mds or mon logs and I was wondering what you 
> recommend looking at?
>
>
> We are on 12.2.9 running Centos.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD node reinstallation

2018-10-30 Thread Luiz Gustavo Tonello
Thank you guys,

It'll save me a bunch of time, because the process to reallocate OSD files
is not so fast. :-)



On Tue, Oct 30, 2018 at 6:15 AM Alexandru Cucu  wrote:

> Don't forget about the cephx keyring if you are using cephx ;)
>
> Usually sits in:
> /var/lib/ceph/bootstrap-osd/ceph.keyring
>
> ---
> Alex
>
> On Tue, Oct 30, 2018 at 4:48 AM David Turner 
> wrote:
> >
> > Set noout, reinstall the OS without going the OSDs (including any
> journal partitions and maintaining any dmcrypt keys if you have
> encryption), install ceph, make sure the ceph.conf file is correct,zip
> start OSDs, unset noout once they're back up and in. All of the data the
> OSD needs to start is on the OSD itself.
> >
> > On Mon, Oct 29, 2018, 6:52 PM Luiz Gustavo Tonello <
> gustavo.tone...@gmail.com> wrote:
> >>
> >> Hi list,
> >>
> >> I have a situation that I need to reinstall the O.S. of a single node
> in my OSD cluster.
> >> This node has 4 OSDs configured, each one has ~4 TB used.
> >>
> >> The way that I'm thinking to proceed is to put OSD down (one each
> time), stop the OSD, reinstall the O.S., and finally add the OSDs again.
> >>
> >> But I want to know if there's a way to do this in a more simple
> process, maybe put OSD in maintenance (noout), reinstall the O.S. without
> formatting my Storage volumes, install CEPH again and enable OSDs again.
> >>
> >> There's a way like these?
> >>
> >> I'm running CEPH Jewel.
> >>
> >> Best,
> >> --
> >> Luiz Gustavo P Tonello.
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Luiz Gustavo P Tonello.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reducing Max_mds

2018-10-30 Thread John Spray
On Tue, Oct 30, 2018 at 6:36 AM Rhian Resnick  wrote:
>
> Evening,
>
>
> I am looking to decrease our max mds servers as we had a server failure and 
> need to remove a node.
>
>
> When we attempt to decrease the number of mds servers from 5 to 4 (or any 
> other number) they never transition to standby. They just stay active.
>
>
> ceph fs set cephfs max_mds X

After you decrease max_mds, use "ceph mds deactivate " to bring
the actual number of active daemons in line with your new intended
maximum.

>From Ceph 13.x that happens automatically, but since you're on 12.x it
needs doing by hand.

John

>
> Nothing looks useful in the mds or mon logs and I was wondering what you 
> recommend looking at?
>
>
> We are on 12.2.9 running Centos.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD node reinstallation

2018-10-30 Thread Alexandru Cucu
Don't forget about the cephx keyring if you are using cephx ;)

Usually sits in:
/var/lib/ceph/bootstrap-osd/ceph.keyring

---
Alex

On Tue, Oct 30, 2018 at 4:48 AM David Turner  wrote:
>
> Set noout, reinstall the OS without going the OSDs (including any journal 
> partitions and maintaining any dmcrypt keys if you have encryption), install 
> ceph, make sure the ceph.conf file is correct,zip start OSDs, unset noout 
> once they're back up and in. All of the data the OSD needs to start is on the 
> OSD itself.
>
> On Mon, Oct 29, 2018, 6:52 PM Luiz Gustavo Tonello 
>  wrote:
>>
>> Hi list,
>>
>> I have a situation that I need to reinstall the O.S. of a single node in my 
>> OSD cluster.
>> This node has 4 OSDs configured, each one has ~4 TB used.
>>
>> The way that I'm thinking to proceed is to put OSD down (one each time), 
>> stop the OSD, reinstall the O.S., and finally add the OSDs again.
>>
>> But I want to know if there's a way to do this in a more simple process, 
>> maybe put OSD in maintenance (noout), reinstall the O.S. without formatting 
>> my Storage volumes, install CEPH again and enable OSDs again.
>>
>> There's a way like these?
>>
>> I'm running CEPH Jewel.
>>
>> Best,
>> --
>> Luiz Gustavo P Tonello.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy with a specified osd ID

2018-10-30 Thread Paul Emmerich
ceph-deploy doesn't support that. You can use ceph-disk or ceph-volume
directly (with basically the same syntax as ceph-deploy), but you can
only explicitly re-use an OSD id if you set it to destroyed before.

I.e., the proper way to replace an OSD while avoiding unnecessary data
movement is:
ceph osd destroy osd.XX
ceph-volume lvm prepare ... --osd-id 10

Also, check out "ceph osd purge" to remove OSDs with one simple step.

Paul
Am Mo., 29. Okt. 2018 um 14:43 Uhr schrieb Jin Mao :
>
> Gents,
>
> My cluster had a gap in the OSD sequence numbers at certain point. Basically, 
> because of missing osd auth del/rm" in a previous disk replacement task for 
> osd.17, a new osd.34 was created. It did not really bother me until recently 
> when I tried to replace all smaller disks to bigger disks.
>
> Ceph seems also pick up the next available osd sequence number. When I 
> replace osd.18, the disk came up online as osd.17. When I am doing osd.19, it 
> became osd.18. It generated more backfull_wait pgs than sticking to the 
> original osd number.
>
> Using ceph-deploy in version 10.2.3, is there a way to specify osd id when 
> doing osd activate?
>
> Thank you.
>
> Jin.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Ceph Meetup Cape Town

2018-10-30 Thread Thomas Bennett
Hi,

SARAO  is excited to announce that it will be
hosting a Ceph Meetup in Cape Town.

Date: Wednesday 28'th November
Time: 5pm to 8pm
Venue: Workshop 17  at the V Waterfront

Space is limited, so if you would like to attend, please complete the
following form to register: https://goo.gl/forms/imuP47iCYssNMqHA2

Kind regards,
SARAO storage team

-- 
Thomas Bennett

SARAO
Science Data Processing
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-30 Thread Nick Fisk
> > >> On 10/18/2018 7:49 PM, Nick Fisk wrote:
> > >>> Hi,
> > >>>
> > >>> Ceph Version = 12.2.8
> > >>> 8TB spinner with 20G SSD partition
> > >>>
> > >>> Perf dump shows the following:
> > >>>
> > >>> "bluefs": {
> > >>>   "gift_bytes": 0,
> > >>>   "reclaim_bytes": 0,
> > >>>   "db_total_bytes": 21472731136,
> > >>>   "db_used_bytes": 3467640832,
> > >>>   "wal_total_bytes": 0,
> > >>>   "wal_used_bytes": 0,
> > >>>   "slow_total_bytes": 320063143936,
> > >>>   "slow_used_bytes": 4546625536,
> > >>>   "num_files": 124,
> > >>>   "log_bytes": 11833344,
> > >>>   "log_compactions": 4,
> > >>>   "logged_bytes": 316227584,
> > >>>   "files_written_wal": 2,
> > >>>   "files_written_sst": 4375,
> > >>>   "bytes_written_wal": 204427489105,
> > >>>   "bytes_written_sst": 248223463173
> > >>>
> > >>> Am I reading that correctly, about 3.4GB used out of 20GB on the SSD, 
> > >>> yet 4.5GB of DB is stored on the spinning disk?
> > >> Correct. Most probably the rationale for this is the layered scheme
> > >> RocksDB uses to keep its sst. For each level It has a maximum
> > >> threshold (determined by level no, some base value and
> > >> corresponding multiplier - see max_bytes_for_level_base &
> > >> max_bytes_for_level_multiplier at
> > >> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide)
> > >> If the next level  (at its max size) doesn't fit into the space 
> > >> available at DB volume - it's totally spilled over to slow device.
> > >> IIRC level_base is about 250MB and multiplier is 10 so the third level 
> > >> needs 25Gb and hence doesn't fit into your DB volume.
> > >>
> > >> In fact  DB volume of 20GB is VERY small for 8TB OSD - just 0.25% of the 
> > >> slow one. AFAIR current recommendation is about 4%.
> > >>
> > > Thanks Igor, these nodes were designed back in the filestore days
> > > where Small 10DWPD SSD's were all the rage, I might be able to
> > shrink the OS/swap partition and get each DB partition up to 25/26GB,
> > they are not going to get any bigger than that as that’s the NVME
> > completely filled. But I'm then going have to effectively wipe all the 
> > disks I've done so far and re-backfill. ☹ Are there any tunables to
> change this behaviour post OSD deployment to move data back onto SSD?
> > None I'm aware of.
> >
> > However I've just completed development for offline BlueFS volume
> > migration feature within ceph-bluestore-tool. It allows DB/WAL volumes
> > allocation and resizing as well as moving BlueFS data between volumes (with 
> > some limitations unrelated to your case). Hence one
> doesn't need slow backfilling to adjust BlueFS volume configuration.
> > Here is the PR (Nautilus only for now):
> > https://github.com/ceph/ceph/pull/23103
> 
> That sounds awesome, I might look at leaving the current OSD's how they are 
> and look to "fix" them when Nautilus comes out.
> 
> >
> > >
> > > On a related note, does frequently accessed data move into the SSD,
> > > or is the overspill a one way ticket? I would assume writes
> > would cause data in rocksdb to be written back into L0 and work its way 
> > down, but I'm not sure about reads?
> > AFAIK reads don't trigger any data layout changes.
> >
> 
> 
> 
> > >
> > > So I think the lesson from this is that despite whatever DB usage
> > > you may think you may end up with, always make sure your SSD
> > partition is bigger than 26GB (L0+L1)?
> > In fact that's
> > L0+L1 (2x250Mb), L2(2500MB), L3(25000MB) which is about 28GB.
> 
> Well I upgraded a new node and after shrinking the OS, I managed to assign 
> 29GB as the DB's. It's just finished backfilling and
> disappointingly it looks like the DB has over spilled onto the disks ☹ So the 
> magic minimum number is going to be somewhere between
> 30GB and 40GB. I might be able to squeeze 30G partitions out if I go for a 
> tiny OS disk and no swap. Will try that on the next one.
> Hoping that 30G does it.
> 

Mark, looping you in as we were talking about this last Thursday.

So it looks like the magic size is 30G. I re-created a single OSD with a 30G DB 
partition and after backfilling all data is now stored on the SSD. Perf dump 
below showing difference between 29G and 30G partitions:

30G
"db_total_bytes": 32210149376,
"db_used_bytes": 7182745600,
"slow_total_bytes": 320063143936,
"slow_used_bytes": 0,

29G
"db_total_bytes": 31136407552,
"db_used_bytes": 3696230400,
"slow_total_bytes": 320063143936,
"slow_used_bytes": 5875171328,

So it seems the minimum sizes for SSD partition should be 30G, unless you have 
<1TB spinning disks, which might fit in 3G partition. 30G should cover most RBD 
workloads up to pretty large disks (8TB in my example).  RGW workloads I'm 
guessing are most at risk for having larger DB requirements and so probably the 
next minimum size would be just over 

[ceph-users] Reducing Max_mds

2018-10-30 Thread Rhian Resnick
Evening,


I am looking to decrease our max mds servers as we had a server failure and 
need to remove a node.


When we attempt to decrease the number of mds servers from 5 to 4 (or any other 
number) they never transition to standby. They just stay active.


ceph fs set cephfs max_mds X


Nothing looks useful in the mds or mon logs and I was wondering what you 
recommend looking at?


We are on 12.2.9 running Centos.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com