Re: [ceph-users] Error Creating OSD

2018-04-14 Thread Rhian Resnick
Afternoon,


Happily, I resolved this issue.


Running vgdisplay showed that ceph-volume tried to create a disk on failed 
disk. (We didn't know we had a bad did so this is information that was new to 
us) and when the command failed it left three bad volume groups. Since you 
cannot rename them you need to use the following command to delete them.


vgdisplay to find the bad volume groups

vgremove --select vg_uuid=your uuid -f # -f forces it to be removed


Rhian Resnick

Associate Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] 



From: Rhian Resnick
Sent: Saturday, April 14, 2018 12:47 PM
To: Alfredo Deza
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Error Creating OSD


Thanks all,


Here is a link to our our command being executed: https://pastebin.com/iy8iSaKH



Here are the results from the command


Executed with debug enabled (after a zap with destroy)


[root@ceph-storage3 ~]# ceph-volume lvm create --bluestore --data /dev/sdu
Running command: ceph-authtool --gen-print-key
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
664894a8-530a-4557-b2f4-1af5b391f2b7
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.140 --yes-i-really-mean-it
 stderr: purged osd.140
Traceback (most recent call last):
  File "/sbin/ceph-volume", line 6, in 
main.Volume()
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 37, in 
__init__
self.main(self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, 
in newfunc
return f(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 
38, in main
terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", 
line 74, in main
self.create(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, 
in is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", 
line 26, in create
prepare_step.safe_prepare(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 217, in safe_prepare
self.prepare(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, 
in is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 283, in prepare
block_lv = self.prepare_device(args.data, 'block', cluster_fsid, osd_fsid)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 193, in prepare_device
if api.get_vg(vg_name=vg_name):
  File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 334, in 
get_vg
return vgs.get(vg_name=vg_name, vg_tags=vg_tags)
  File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 429, in 
get
raise MultipleVGsError(vg_name)
ceph_volume.exceptions.MultipleVGsError: Got more than 1 result looking for 
volume group: ceph-6a2e8f21-bca2-492b-8869-eecc995216cc




Rhian Resnick

Associate Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] 



From: Alfredo Deza 
Sent: Saturday, April 14, 2018 8:45 AM
To: Rhian Resnick
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Error Creating OSD



On Fri, Apr 13, 2018 at 8:20 PM, Rhian Resnick 
> wrote:

Evening,

When attempting to create an OSD we receive the following error.

[ceph-admin@ceph-storage3 ~]$ sudo ceph-volume lvm create --bluestore --data 
/dev/sdu
Running command: ceph-authtool --gen-print-key
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
c8cb8cff-dad9-48b8-8d77-6f130a4b629d
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.140 --yes-i-really-mean-it
 stderr: purged osd.140
-->  MultipleVGsError: Got more than 

[ceph-users] Fixing bad radosgw index

2018-04-14 Thread Robert Stanford
 I deleted my default.rgw.buckets.data and default.rgw.buckets.index pools
in an attempt to clean them out.  I brought this up on the list and
received replies telling me essentially, "You shouldn't do that."  There
was however no helpful advice on recovering.

 When I run 'radosgw-admin bucket list' I get a list of all my old buckets
(I thought they'd be cleaned out when I deleted and recreated
default.rgw.buckets.index, but I was wrong.)  Deleting them with s3cmd and
radosgw-admin does nothing; they still appear (though s3cmd will give a
'404' error.)  Running radosgw-admin with 'bucket check' and '--fix' does
nothing as well.  So, how do I get myself out of this mess.

 On another, semi-related note, I've been deleting (existing) buckets and
their contents with s3cmd (and --recursive); the space is never freed from
ceph and the bucket still appears in s3cmd ls.  Looks like my radosgw has
several issues, maybe all related to deleting and recreating the pools.

 Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error Creating OSD

2018-04-14 Thread Rhian Resnick
Thanks all,


Here is a link to our our command being executed: https://pastebin.com/iy8iSaKH



Here are the results from the command


Executed with debug enabled (after a zap with destroy)


[root@ceph-storage3 ~]# ceph-volume lvm create --bluestore --data /dev/sdu
Running command: ceph-authtool --gen-print-key
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
664894a8-530a-4557-b2f4-1af5b391f2b7
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.140 --yes-i-really-mean-it
 stderr: purged osd.140
Traceback (most recent call last):
  File "/sbin/ceph-volume", line 6, in 
main.Volume()
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 37, in 
__init__
self.main(self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, 
in newfunc
return f(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 
38, in main
terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", 
line 74, in main
self.create(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, 
in is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", 
line 26, in create
prepare_step.safe_prepare(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 217, in safe_prepare
self.prepare(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, 
in is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 283, in prepare
block_lv = self.prepare_device(args.data, 'block', cluster_fsid, osd_fsid)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 193, in prepare_device
if api.get_vg(vg_name=vg_name):
  File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 334, in 
get_vg
return vgs.get(vg_name=vg_name, vg_tags=vg_tags)
  File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 429, in 
get
raise MultipleVGsError(vg_name)
ceph_volume.exceptions.MultipleVGsError: Got more than 1 result looking for 
volume group: ceph-6a2e8f21-bca2-492b-8869-eecc995216cc




Rhian Resnick

Associate Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] 



From: Alfredo Deza 
Sent: Saturday, April 14, 2018 8:45 AM
To: Rhian Resnick
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Error Creating OSD



On Fri, Apr 13, 2018 at 8:20 PM, Rhian Resnick 
> wrote:

Evening,

When attempting to create an OSD we receive the following error.

[ceph-admin@ceph-storage3 ~]$ sudo ceph-volume lvm create --bluestore --data 
/dev/sdu
Running command: ceph-authtool --gen-print-key
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
c8cb8cff-dad9-48b8-8d77-6f130a4b629d
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.140 --yes-i-really-mean-it
 stderr: purged osd.140
-->  MultipleVGsError: Got more than 1 result looking for volume group: 
ceph-6a2e8f21-bca2-492b-8869-eecc995216cc

Any hints on what to do? This occurs when we attempt to create osd's on this 
node.

Can you use a paste site and get the /var/log/ceph/ceph-volume.log contents? 
Also, if you could try the same command but with:

CEPH_VOLUME_DEBUG=1

I think you are hitting two issues here:

1) Somehow `osd new` is not completing and failing
2) The `purge` command to wipe out the LV is getting multiple LV's and cannot 
make sure to match the one it used.

#2 definitely looks like something we are doing wrong, and #1 can have a lot of 
different causes. The logs would be tremendously helpful!


Rhian Resnick

Associate Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] 

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-04-14 Thread Alexandre DERUMIER
Hi,

Still leaking again after update to 12.2.4, around 17G after 9 days




USER PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND

ceph  629903 50.7 25.9 17473680 17082432 ?   Ssl  avril05 6498:21 
/usr/bin/ceph-mds -f --cluster ceph --id ceph4-1.odiso.net --setuser ceph 
--setgroup ceph





~# ceph daemon mds.ceph4-1.odiso.net cache status
{
"pool": {
"items": 16019302,
"bytes": 5100941968
}
}





# ceph daemon mds.ceph4-1.odiso.net perf dump
{
"AsyncMessenger::Worker-0": {
"msgr_recv_messages": 648541059,
"msgr_send_messages": 666102301,
"msgr_recv_bytes": 4943336751206,
"msgr_send_bytes": 868468165048,
"msgr_created_connections": 167,
"msgr_active_connections": 166,
"msgr_running_total_time": 33884.943400671,
"msgr_running_send_time": 12229.226645264,
"msgr_running_recv_time": 26234.680757843,
"msgr_running_fast_dispatch_time": 4650.248980986
},
"AsyncMessenger::Worker-1": {
"msgr_recv_messages": 732301444,
"msgr_send_messages": 750526966,
"msgr_recv_bytes": 4248782228635,
"msgr_send_bytes": 2379403291660,
"msgr_created_connections": 172,
"msgr_active_connections": 171,
"msgr_running_total_time": 38490.093448635,
"msgr_running_send_time": 14692.222019414,
"msgr_running_recv_time": 31000.304091618,
"msgr_running_fast_dispatch_time": 3945.573521893
},
"AsyncMessenger::Worker-2": {
"msgr_recv_messages": 503228767,
"msgr_send_messages": 485729577,
"msgr_recv_bytes": 3644656184942,
"msgr_send_bytes": 526380645708,
"msgr_created_connections": 156,
"msgr_active_connections": 156,
"msgr_running_total_time": 26566.051442840,
"msgr_running_send_time": 9335.249687474,
"msgr_running_recv_time": 22643.927960456,
"msgr_running_fast_dispatch_time": 3426.566334706
},
"finisher-PurgeQueue": {
"queue_len": 0,
"complete_latency": {
"avgcount": 2077128,
"sum": 10029.468276512,
"avgtime": 0.004828526
}
},
"mds": {
"request": 1320419754,
"reply": 1320418963,
"reply_latency": {
"avgcount": 1320418963,
"sum": 3567340.917522550,
"avgtime": 0.002701673
},
"forward": 0,
"dir_fetch": 95955541,
"dir_commit": 5380286,
"dir_split": 29080,
"dir_merge": 28453,
"inode_max": 2147483647,
"inodes": 2049324,
"inodes_top": 55759,
"inodes_bottom": 118910,
"inodes_pin_tail": 1874655,
"inodes_pinned": 1969667,
"inodes_expired": 14225864524,
"inodes_with_caps": 1969030,
"caps": 3010600,
"subtrees": 2,
"traverse": 1433042396,
"traverse_hit": 855810795,
"traverse_forward": 0,
"traverse_discover": 0,
"traverse_dir_fetch": 75553963,
"traverse_remote_ino": 5462,
"traverse_lock": 217,
"load_cent": 132079451933,
"q": 41,
"exported": 0,
"exported_inodes": 0,
"imported": 0,
"imported_inodes": 0
},
"mds_cache": {
"num_strays": 150,
"num_strays_delayed": 0,
"num_strays_enqueuing": 0,
"strays_created": 2317004,
"strays_enqueued": 2316671,
"strays_reintegrated": 288,
"strays_migrated": 0,
"num_recovering_processing": 0,
"num_recovering_enqueued": 0,
"num_recovering_prioritized": 0,
"recovery_started": 0,
"recovery_completed": 0,
"ireq_enqueue_scrub": 0,
"ireq_exportdir": 0,
"ireq_flush": 0,
"ireq_fragmentdir": 57533,
"ireq_fragstats": 0,
"ireq_inodestats": 0
},
"mds_log": {
"evadd": 293928039,
"evex": 293928281,
"evtrm": 293926233,
"ev": 26595,
"evexg": 0,
"evexd": 2048,
"segadd": 365381,
"segex": 365382,
"segtrm": 365380,
"seg": 32,
"segexg": 0,
"segexd": 2,
"expos": 4997676796422,
"wrpos": 4997732797135,
"rdpos": 4232612352311,
"jlat": {
"avgcount": 62629276,
"sum": 260619.838247062,
"avgtime": 0.004161310
},
"replayed": 24789
},
"mds_mem": {
"ino": 2048405,
"ino+": 14160488289,
"ino-": 14158439884,
"dir": 377882,
"dir+": 15421679,
"dir-": 15043797,
"dn": 2049614,
"dn+": 14231703198,
"dn-": 14229653584,
"cap": 3010600,
"cap+": 1555206662,
"cap-": 1552196062,
"rss": 17082432,
"heap": 313916,
"buf": 0
},
"mds_server": {
"dispatch_client_request": 1437033326,
"dispatch_server_request": 0,

Re: [ceph-users] Error Creating OSD

2018-04-14 Thread Alfredo Deza
On Fri, Apr 13, 2018 at 8:20 PM, Rhian Resnick  wrote:

> Evening,
>
> When attempting to create an OSD we receive the following error.
>
> [ceph-admin@ceph-storage3 ~]$ sudo ceph-volume lvm create --bluestore
> --data /dev/sdu
> Running command: ceph-authtool --gen-print-key
> Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
> c8cb8cff-dad9-48b8-8d77-6f130a4b629d
> --> Was unable to complete a new OSD, will rollback changes
> --> OSD will be fully purged from the cluster, because the ID was generated
> Running command: ceph osd purge osd.140 --yes-i-really-mean-it
>  stderr: purged osd.140
> -->  MultipleVGsError: Got more than 1 result looking for volume group:
> ceph-6a2e8f21-bca2-492b-8869-eecc995216cc
>
> Any hints on what to do? This occurs when we attempt to create osd's on
> this node.
>

Can you use a paste site and get the /var/log/ceph/ceph-volume.log
contents? Also, if you could try the same command but with:

CEPH_VOLUME_DEBUG=1

I think you are hitting two issues here:

1) Somehow `osd new` is not completing and failing
2) The `purge` command to wipe out the LV is getting multiple LV's and
cannot make sure to match the one it used.

#2 definitely looks like something we are doing wrong, and #1 can have a
lot of different causes. The logs would be tremendously helpful!

>
> Rhian Resnick
>
> Associate Director Middleware and HPC
>
> Office of Information Technology
>
>
> Florida Atlantic University
>
> 777 Glades Road, CM22, Rm 173B
>
> Boca Raton, FL 33431
>
> Phone 561.297.2647
>
> Fax 561.297.0222
>
>  [image: image] 
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd-nbd not resizing even after kernel tweaks

2018-04-14 Thread Jason Dillaman
Great, thanks for the update.

Jason

On Fri, Apr 13, 2018 at 11:06 PM, Alex Gorbachev  
wrote:
> On Thu, Apr 12, 2018 at 9:38 AM, Alex Gorbachev  
> wrote:
>> On Thu, Apr 12, 2018 at 7:57 AM, Jason Dillaman  wrote:
>>> If you run "partprobe" after you resize in your second example, is the
>>> change visible in "parted"?
>>
>> No, partprobe does not help:
>>
>> root@lumd1:~# parted /dev/nbd2 p
>> Model: Unknown (unknown)
>> Disk /dev/nbd2: 2147MB
>> Sector size (logical/physical): 512B/512B
>> Partition Table: loop
>> Disk Flags:
>>
>> Number  Start  End SizeFile system  Flags
>>  1  0.00B  2147MB  2147MB  xfs
>>
>> root@lumd1:~# partprobe
>> root@lumd1:~# parted /dev/nbd2 p
>> Model: Unknown (unknown)
>> Disk /dev/nbd2: 2147MB
>> Sector size (logical/physical): 512B/512B
>> Partition Table: loop
>> Disk Flags:
>>
>> Number  Start  End SizeFile system  Flags
>>  1  0.00B  2147MB  2147MB  xfs
>>
>>
>>
>>>
>>> On Wed, Apr 11, 2018 at 11:01 PM, Alex Gorbachev  
>>> wrote:
 On Wed, Apr 11, 2018 at 2:13 PM, Jason Dillaman  
 wrote:
> I've tested the patch on both 4.14.0 and 4.16.0 and it appears to
> function correctly for me. parted can see the newly added free-space
> after resizing the RBD image and our stress tests once again pass
> successfully. Do you have any additional details on the issues you are
> seeing?

 I recompiled again with 4.14-24 and tested, the resize shows up OK
 when the filesystem is not mounted.  dmesg shows also the "detected
 capacity change" message.  However, if I create a filesystem and mount
 it, the capacity change is no longer detected.  Steps as follows:

 root@lumd1:~# rbd create -s 1024 --image-format 2 matte/n4
 root@lumd1:~# rbd-nbd map matte/n4
 /dev/nbd2
 root@lumd1:~# mkfs.xfs /dev/nbd2
 meta-data=/dev/nbd2  isize=512agcount=4, agsize=65536 blks
  =   sectsz=512   attr=2, projid32bit=1
  =   crc=1finobt=1, sparse=0
 data =   bsize=4096   blocks=262144, imaxpct=25
  =   sunit=0  swidth=0 blks
 naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
 log  =internal log   bsize=4096   blocks=2560, version=2
  =   sectsz=512   sunit=0 blks, lazy-count=1
 realtime =none   extsz=4096   blocks=0, rtextents=0
 root@lumd1:~# parted /dev/nbd2 p
 Model: Unknown (unknown)
 Disk /dev/nbd2: 1074MB
 Sector size (logical/physical): 512B/512B
 Partition Table: loop
 Disk Flags:

 Number  Start  End SizeFile system  Flags
  1  0.00B  1074MB  1074MB  xfs

 root@lumd1:~# rbd resize --pool matte --image n4 --size 2048
 Resizing image: 100% complete...done.
 root@lumd1:~# parted /dev/nbd2 p
 Model: Unknown (unknown)
 Disk /dev/nbd2: 2147MB
 Sector size (logical/physical): 512B/512B
 Partition Table: loop
 Disk Flags:

 Number  Start  End SizeFile system  Flags
  1  0.00B  2147MB  2147MB  xfs

 -- All is well so far, now let's mount the fs

 root@lumd1:~# mount /dev/nbd2 /mnt
 root@lumd1:~# rbd resize --pool matte --image n4 --size 3072
 Resizing image: 100% complete...done.
 root@lumd1:~# parted /dev/nbd2 p
 Model: Unknown (unknown)
 Disk /dev/nbd2: 2147MB
 Sector size (logical/physical): 512B/512B
 Partition Table: loop
 Disk Flags:

 Number  Start  End SizeFile system  Flags
  1  0.00B  2147MB  2147MB  xfs

 -- Now the change is not detected


>
> On Wed, Apr 11, 2018 at 12:06 PM, Jason Dillaman  
> wrote:
>> I'll give it a try locally and see if I can figure it out. Note that
>> this commit [1] also dropped the call to "bd_set_size" within
>> "nbd_size_update", which seems suspicious to me at initial glance.
>>
>> [1] 
>> https://github.com/torvalds/linux/commit/29eaadc0364943b6352e8994158febcb699c9f9b#diff-bc9273bcb259fef182ae607a1d06a142L180
>>
>> On Wed, Apr 11, 2018 at 11:09 AM, Alex Gorbachev 
>>  wrote:
 On Wed, Apr 11, 2018 at 10:27 AM, Alex Gorbachev 
  wrote:
> On Wed, Apr 11, 2018 at 2:43 AM, Mykola Golub 
>  wrote:
>> On Tue, Apr 10, 2018 at 11:14:58PM -0400, Alex Gorbachev wrote:
>>
>>> So Josef fixed the one issue that enables e.g. lsblk and sysfs size 
>>> to
>>> reflect the correct siz on change.  However, partptobe and parted
>>> still do not detect the change, complete unmap and remap of rbd-nbd
>>> device and