Re: [ceph-users] repomd.xml: [Errno 14] HTTP Error 404 - Not Found on download.ceph.com for rhel7

2016-07-08 Thread Alexander Lim
Yes, happened to me too. Simply changing the URL fixed the problem.

On Fri, Jul 8, 2016 at 6:55 PM, Martin Palma  wrote:

> It seems that the packages "ceph-release-*.noarch.rpm" contain a
> ceph.repo pointing to the baseurl
> "http://ceph.com/rpm-hammer/rhel7/$basearch; which does not exist. It
> should probably point to "http://ceph.com/rpm-hammer/el7/$basearch;.
>
> - Martin
>
> On Thu, Jul 7, 2016 at 5:57 PM, Martin Palma  wrote:
> > Hi All,
> >
> > it seems that the "rhel7" folder/symlink on
> > "download.ceph.com/rpm-hammer" does not exist anymore therefore
> > ceph-deploy fails to deploy a new cluster. Just tested it by setting
> > up a new lab environment.
> >
> > We have the same issue on our production cluster currently, which
> > keeps us of updating it. Simple fix would be to change the url to
> > "download.ceph.com/rpm-hammer/el7/..."  in the repo files I guess
> >
> > Any thoughts on that?
> >
> > We are running on CentOS 7.2.
> >
> > Best,
> > Martin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Active MON aborts on Jewel 10.2.2 with FAILED assert(info.state == MDSMap::STATE_STANDBY

2016-07-08 Thread Bill Sharer
Just for giggles I tried the rolling upgrade to 10.2.2 again today.  
This time I rolled mon.0 and osd.0 first while keeping the mds servers 
up and then rolled them before moving on to the other three.  No 
assertion failure this time since I guess I always had an mds active.  I 
wonder if I will have a problem though if I do a complete cold start of 
the cluster.


Bill Sharer


On 07/06/2016 04:19 PM, Bill Sharer wrote:
Manual downgrade to 10.2.0 put me back in business.  I'm going to mask 
10.2.2 and then try to let 10.2.1 emerge.


Bill Sharer

On 07/06/2016 02:16 PM, Bill Sharer wrote:
I noticed on that USE list that the 10.2.2 ebuild introduced a new 
cephfs emerge flag, so I enabled that and emerged everywhere again.  
The active mon is still crashing on the assertion though.



Bill Sharer

On 07/05/2016 08:14 PM, Bill Sharer wrote:

Relevant USE flags FWIW

# emerge -pv ceph

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R   ~] sys-cluster/ceph-10.2.2::gentoo  USE="fuse gtk 
jemalloc ldap libaio libatomic nss radosgw static-libs xfs 
-babeltrace -cephfs -cryptopp -debug -lttng -tcmalloc {-test} -zfs" 
PYTHON_TARGETS="python2_7 python3_4 -python3_5" 11,271 KiB



Bill Sharer


On 07/05/2016 01:45 PM, Gregory Farnum wrote:

Thanks for the report; created a ticket and somebody will get on it
shortly. http://tracker.ceph.com/issues/16592
-Greg

On Sun, Jul 3, 2016 at 5:55 PM, Bill Sharer 
 wrote:
I was working on  a rolling upgrade on Gentoo to Jewel 10.2.2 from 
10.2.0.
However now I can't get a monitor quorum going again because as 
soon as I

get one, the mon which wins the election blows out with an assertion
failure.  Here's my status at the moment

kroll110.2.2ceph mon.0 and ceph osd.0 normally my lead mon
kroll210.2.2ceph mon 1 and ceph osd 2
kroll310.2.2ceph osd 1
kroll410.2.2ceph mon 3 and ceph osd 3
kroll510.2.2ceph mon 4 and ceph mds 2 normally my active mds
kroll610.2.0ceph mon 5 and ceph mds B normally standby mds

I had done rolling upgrade of everything but kroll6 and had 
rebooted the
first three osd and mon servers.  mds 2 went down during gentoo 
update of
kroll5 because of memory scarcity so mds B was the active mds 
server.  After

rebooting kroll4 I found that mon 0 had gone done with the assertion
failure.  I ended up stopping all ceph processes but desktops with 
client
mounts were all still up for the moment and basically would be 
stuck on

locks if I tried to access cephfs.

After trying to restart mons only beginning with mon 0 initially, the
following happened to mon.0 after enough mons were up for a quorum:

2016-07-03 16:34:26.555728 7fbff22f8480  1 leveldb: Recovering log 
#2592390
2016-07-03 16:34:26.555762 7fbff22f8480  1 leveldb: Level-0 table 
#2592397:

started
2016-07-03 16:34:26.558788 7fbff22f8480  1 leveldb: Level-0 table 
#2592397:

192 bytes OK
2016-07-03 16:34:26.562263 7fbff22f8480  1 leveldb: Delete type=3 
#2592388


2016-07-03 16:34:26.562364 7fbff22f8480  1 leveldb: Delete type=0 
#2592390


2016-07-03 16:34:26.563126 7fbff22f8480 -1 wrote monmap to
/etc/ceph/tmpmonmap
2016-07-03 17:09:25.753729 7f8291dff480  0 ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374), pro
cess ceph-mon, pid 20842
2016-07-03 17:09:25.762588 7f8291dff480  1 leveldb: Recovering log 
#2592398
2016-07-03 17:09:25.767722 7f8291dff480  1 leveldb: Delete type=0 
#2592398


2016-07-03 17:09:25.767803 7f8291dff480  1 leveldb: Delete type=3 
#2592396


2016-07-03 17:09:25.768600 7f8291dff480  0 starting mon.0 rank 0 at
192.168.2.1:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid
1798897a-f0c9-422d-86b3-d4933a12c7ac
2016-07-03 17:09:25.769066 7f8291dff480  1 mon.0@-1(probing) e10 
preinit

fsid 1798897a-f0c9-422d-86b3-d4933a12c7ac
2016-07-03 17:09:25.769923 7f8291dff480  1
mon.0@-1(probing).paxosservice(pgmap 17869652..17870289) refresh 
upgraded,

format 0 -> 1
2016-07-03 17:09:25.769947 7f8291dff480  1 mon.0@-1(probing).pg v0
on_upgrade discarding in-core PGMap
2016-07-03 17:09:25.776148 7f8291dff480  0 mon.0@-1(probing).mds 
e1532

print_map
e1532
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client 
writeable
ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=no anchor 
table}


Filesystem 'cephfs' (0)
fs_name cephfs
epoch   1530
flags   0
modified2016-05-19 01:21:31.953710
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
last_failure1478
last_failure_osd_epoch  26431
compat  compat={},rocompat={},incompat={1=base v0.20,2=client 
writeable
ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=no anchor 
table}

max_mds 1
in  0
up  {0=1190233}
failed
damaged
stopped

Re: [ceph-users] Data recovery stuck

2016-07-08 Thread Brad Hubbard
On Sat, Jul 9, 2016 at 1:20 AM, Pisal, Ranjit Dnyaneshwar
 wrote:
> Hi All,
>
>
>
> I am in process of adding new OSDs to Cluster however after adding second
> node Cluster recovery seems to be stopped.
>
>
>
> Its more than 3 days but Objects degraded % has not improved even by 1%.
>
>
>
> Will adding further OSDs help improve situation or is there any other way to
> improve recovery process?
>
>
>
>
>
> [ceph@MYOPTPDN01 ~]$ ceph -s
>
> cluster 9e3e9015-f626-4a44-83f7-0a939ef7ec02
>
>  health HEALTH_WARN 315 pgs backfill; 23 pgs backfill_toofull; 3 pgs

You have 23 pgs that are "backfill_toofull". You need to identify these pgs.

You could try increasing the backfill full ratio for those pgs.

ceph health detail
ceph tell osd. injectargs '--osd-backfill-full-ratio=0.90'

Keep in mind new storage needs to be added to the cluster as soon as possible
but I guess that's what you are trying to do.

You could also look at reweighting the full OSDs if you have other OSDs with
considerable space available.

HTH,
Brad

> backfilling; 53 pgs degraded; 2 pgs recovering; 232 pgs recovery_wait; 552
> pgs stuck unclean; recovery 3622384/90976826 objects degraded (3.982%); 1
> near full osd(s)
>
>  monmap e4: 5 mons at
> {MYOPTPDN01=10.115.1.136:6789/0,MYOPTPDN02=10.115.1.137:6789/0,MYOPTPDN03=10.115.1.138:6789/0,MYOPTPDN04=10.115.1.139:6789/0,MYOPTPDN05=10.115.1.140:6789/0},
> election epoch 6654, quorum 0,1,2,3,4
> MYOPTPDN01,MYOPTPDN02,MYOPTPDN03,MYOPTPDN04,MYOPTPDN05
>
>  osdmap e198079: 171 osds: 171 up, 171 in
>
>   pgmap v26428186: 5696 pgs, 4 pools, 105 TB data, 28526 kobjects
>
> 329 TB used, 136 TB / 466 TB avail
>
> 3622384/90976826 objects degraded (3.982%)
>
>   23 active+remapped+wait_backfill+backfill_toofull
>
>  120 active+recovery_wait+remapped
>
> 5144 active+clean
>
>1 active+recovering+remapped
>
>  104 active+recovery_wait
>
>   45 active+degraded+remapped+wait_backfill
>
>1 active+recovering
>
>3 active+remapped+backfilling
>
>  247 active+remapped+wait_backfill
>
>8 active+recovery_wait+degraded+remapped
>
>   client io 62143 kB/s rd, 100 MB/s wr, 14427 op/s
>
> [ceph@MYOPTPDN01 ~]$
>
>
>
> Best Regards,
>
> Ranjit
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Backing up RBD snapshots to a different cloud service

2016-07-08 Thread Brendan Moloney
Hi,

We have a smallish Ceph cluster for RBD images. We use snapshotting for local 
incremental backups.  I would like to start sending some of these snapshots to 
an external cloud service (likely Amazon) for disaster recovery purposes.

Does anyone have advice on how to do this?  I suppose I could just use the rbd 
export/diff commands but some of our RBD images are quite large (multiple 
terabytes) so I can imagine this becoming quite inefficient. We would either 
need to keep all snapshots indefinitely and retrieve every single snapshot to 
recover or we would have to occasionally send a new full disk image.

I guess doing the backups on the object level could potentially avoid these 
issues, but I am not sure how to go about that.

Any advice is greatly appreciated.

Thanks,
Brendan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-08 Thread Gaurav Goyal
I even tried with bare .raw file but error is still the same

016-07-08 16:29:40.931 86007 INFO nova.compute.claims
[req-b43bbec9-c875-4f4b-ad2c-0d87a02bc7e1 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] Total memory: 193168 MB, used:
1024.00 MB

2016-07-08 16:29:40.931 86007 INFO nova.compute.claims
[req-b43bbec9-c875-4f4b-ad2c-0d87a02bc7e1 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] memory limit: 289752.00 MB, free:
288728.00 MB

2016-07-08 16:29:40.932 86007 INFO nova.compute.claims
[req-b43bbec9-c875-4f4b-ad2c-0d87a02bc7e1 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] Total disk: 8168 GB, used: 1.00 GB

2016-07-08 16:29:40.932 86007 INFO nova.compute.claims
[req-b43bbec9-c875-4f4b-ad2c-0d87a02bc7e1 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] disk limit: 8168.00 GB, free: 8167.00
GB

2016-07-08 16:29:40.948 86007 INFO nova.compute.claims
[req-b43bbec9-c875-4f4b-ad2c-0d87a02bc7e1 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] Claim successful

2016-07-08 16:29:41.384 86007 INFO nova.virt.libvirt.driver
[req-b43bbec9-c875-4f4b-ad2c-0d87a02bc7e1 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] Creating image

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager
[req-b43bbec9-c875-4f4b-ad2c-0d87a02bc7e1 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] Instance failed to spawn

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] Traceback (most recent call last):

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2156, in
_build_resources

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] yield resources

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2009, in
_build_and_run_instance

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01]
block_device_info=block_device_info)

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2531,
in spawn

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] write_to_disk=True)

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4427,
in _get_guest_xml

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] context)

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4286,
in _get_guest_config

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] flavor, guest.os_type)

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3387,
in _get_guest_storage_config

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] inst_type)

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3320,
in _get_guest_disk_config

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] raise exception.Invalid(msg)

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] Invalid: Volume sets discard option,
but libvirt (1, 0, 6) or later is required, qemu (1, 6, 0) or later is
required.

2016-07-08 16:29:42.259 86007 ERROR nova.compute.manager [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01]

2016-07-08 16:29:42.261 86007 INFO nova.compute.manager
[req-b43bbec9-c875-4f4b-ad2c-0d87a02bc7e1 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
cb6056a8-1bb9-4475-a702-9a2b0a7dca01] Terminating instance

2016-07-08 16:29:42.267 86007 INFO nova.virt.libvirt.driver [-] 

Re: [ceph-users] ceph + vmware

2016-07-08 Thread Jan Schermer
There is no Ceph plugin for VMware (and I think you need at least an Enterprise 
license for storage plugins, much $$$).
The "VMware" way to do this without the plugin would be to have a VM running on 
every host serving RBD devices over iSCSI to the other VMs (the way their 
storage applicances work, maybe you could even re-use them somehow? I haven't 
used VMware in a while, so not sure if one can login to the appliance and 
customize it...).
Nevertheless I think it's ugly, messy and is going to be even slower than Ceph 
by itself.

But you can always just use RBD client (kernel/userspace) in the VMs 
themselves, VMware has pretty fast networking so the overhead wouldn't be that 
large.

Jan


> On 08 Jul 2016, at 21:22, Oliver Dzombic  wrote:
> 
> Hi,
> 
> does anyone have experience how to connect vmware with ceph smart ?
> 
> iSCSI multipath does not really worked well.
> NFS could be, but i think thats just too much layers in between to have
> some useable performance.
> 
> Systems like ScaleIO have developed a vmware addon to talk with it.
> 
> Is there something similar out there for ceph ?
> 
> What are you using ?
> 
> Thank you !
> 
> -- 
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:i...@ip-interactive.de
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph + vmware

2016-07-08 Thread Oliver Dzombic
Hi,

does anyone have experience how to connect vmware with ceph smart ?

iSCSI multipath does not really worked well.
NFS could be, but i think thats just too much layers in between to have
some useable performance.

Systems like ScaleIO have developed a vmware addon to talk with it.

Is there something similar out there for ceph ?

What are you using ?

Thank you !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Quick short survey which SSDs

2016-07-08 Thread Carlos M. Perez
I posted a bunch of the more recent numbers in the specs.  Had some down time 
and had a bunch of SSD's lying around and just curious if any were hidden 
gems... Interestingly, the Intel drives seem to not require the write cache 
off, while other drives had to be "forced" off using the hdparm -W0 /dev/sdx to 
make sure it was off.

The machine we tested on is a Dell C2100 Dual x5560, 96GB ram, LSI2008 IT mode 
controller

intel Dc S3700 200GB
Model Number:   INTEL SSDSC2BA200G3L
Firmware Revision:  5DV10265

1 - io=4131.2MB, bw=70504KB/s, iops=17626, runt= 60001msec
5 - io=9529.1MB, bw=162627KB/s, iops=40656, runt= 60001msec
10 - io=7130.5MB, bw=121684KB/s, iops=30421, runt= 60004msec

Samsung SM863
Model Number:   SAMSUNG MZ7KM240HAGR-0E005
Firmware Revision:  GXM1003Q

1 - io=2753.1MB, bw=47001KB/s, iops=11750, runt= 6msec
5 - io=6248.8MB, bw=106643KB/s, iops=26660, runt= 60001msec
10 - io=8084.1MB, bw=137981KB/s, iops=34495, runt= 60001msec

We decided to go with Intel model.  The Samsung was impressive on the higher 
end with multiple threads, but figured for most of our nodes with 4-6 OSD's the 
intel were a bit more proven and had better "light-medium" load numbers.  

Carlos M. Perez
CMP Consulting Services
305-669-1515

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dan 
van der Ster
Sent: Tuesday, July 5, 2016 4:23 AM
To: Christian Balzer 
Cc: ceph-users 
Subject: Re: [ceph-users] Quick short survey which SSDs

On Tue, Jul 5, 2016 at 10:04 AM, Dan van der Ster  wrote:
> On Tue, Jul 5, 2016 at 9:53 AM, Christian Balzer  wrote:
>>> Unfamiliar: Samsung SM863
>>>
>> You might want to read the thread here:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007
>> 871.html
>>
>> And google "ceph SM863".
>>
>> However I'm still waiting for somebody to confirm that these perform 
>> (as one would expect from DC level SSDs) at full speed with sync 
>> writes, which is the only important factor for journals.
>
> Tell me the fio options you're interested in and I'll run it right now.

Using the options from Sebastien's blog I get:

1 job: write: io=5863.3MB, bw=100065KB/s, iops=25016, runt= 60001msec
5 jobs: write: io=11967MB, bw=204230KB/s, iops=51057, runt= 60001msec
10 jobs: write: io=13760MB, bw=234829KB/s, iops=58707, runt= 60001msec

Drive is model MZ7KM240 with firmware GXM1003Q.

--
Dan


[1] fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting 
--name=journal-test ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-08 Thread Gaurav Goyal
[root@OSKVM1 ~]# grep -v "^#" /etc/nova/nova.conf|grep -v ^$

[DEFAULT]

instance_usage_audit = True

instance_usage_audit_period = hour

notify_on_state_change = vm_and_task_state

notification_driver = messagingv2

rbd_user=cinder

rbd_secret_uuid=1989f7a6-4ecb-4738-abbf-2962c29b2bbb

rpc_backend = rabbit

auth_strategy = keystone

my_ip = 10.1.0.4

network_api_class = nova.network.neutronv2.api.API

security_group_api = neutron

linuxnet_interface_driver =
nova.network.linux_net.NeutronLinuxBridgeInterfaceDriver

firewall_driver = nova.virt.firewall.NoopFirewallDriver

enabled_apis=osapi_compute,metadata

[api_database]

connection = mysql://nova:nova@controller/nova

[barbican]

[cells]

[cinder]

os_region_name = RegionOne

[conductor]

[cors]

[cors.subdomain]

[database]

[ephemeral_storage_encryption]

[glance]

host = controller

[guestfs]

[hyperv]

[image_file_url]

[ironic]

[keymgr]

[keystone_authtoken]

auth_uri = http://controller:5000

auth_url = http://controller:35357

auth_plugin = password

project_domain_id = default

user_domain_id = default

project_name = service

username = nova

password = nova

[libvirt]

inject_password=false

inject_key=false

inject_partition=-2

live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER,
VIR_MIGRATE_LIVE, VIR_MIGRATE_PERSIST_DEST, VIR_MIGRATE_TUNNELLED

disk_cachemodes ="network=writeback"

images_type=rbd

images_rbd_pool=vms

images_rbd_ceph_conf =/etc/ceph/ceph.conf

rbd_user=cinder

rbd_secret_uuid=1989f7a6-4ecb-4738-abbf-2962c29b2bbb

hw_disk_discard=unmap

[matchmaker_redis]

[matchmaker_ring]

[metrics]

[neutron]

url = http://controller:9696

auth_url = http://controller:35357

auth_plugin = password

project_domain_id = default

user_domain_id = default

region_name = RegionOne

project_name = service

username = neutron

password = neutron

service_metadata_proxy = True

metadata_proxy_shared_secret = X

[osapi_v21]

[oslo_concurrency]

lock_path = /var/lib/nova/tmp

[oslo_messaging_amqp]

[oslo_messaging_qpid]

[oslo_messaging_rabbit]

rabbit_host = controller

rabbit_userid = openstack

rabbit_password = X

[oslo_middleware]

[rdp]

[serial_console]

[spice]

[ssl]

[trusted_computing]

[upgrade_levels]

[vmware]

[vnc]

enabled = True

vncserver_listen = 0.0.0.0

novncproxy_base_url = http://controller:6080/vnc_auto.html

vncserver_proxyclient_address = $my_ip

[workarounds]

[xenserver]

[zookeeper]


[root@OSKVM1 ceph]# ls -ltr

total 24

-rwxr-xr-x 1 root   root92 May 10 12:58 rbdmap

-rw--- 1 root   root 0 Jun 28 11:05 tmpfDt6jw

-rw-r--r-- 1 root   root63 Jul  5 12:59 ceph.client.admin.keyring

-rw-r--r-- 1 glance glance  64 Jul  5 14:51 ceph.client.glance.keyring

-rw-r--r-- 1 cinder cinder  64 Jul  5 14:53 ceph.client.cinder.keyring

-rw-r--r-- 1 cinder cinder  71 Jul  5 14:54
ceph.client.cinder-backup.keyring

-rwxrwxrwx 1 root   root   438 Jul  7 14:19 ceph.conf

[root@OSKVM1 ceph]# more ceph.client.cinder.keyring

[client.cinder]

key = AQCIAHxX9ga8LxAAU+S3Vybdu+Cm2bP3lplGnA==

[root@OSKVM1 ~]# rados lspools

rbd

volumes

images

backups

vms

[root@OSKVM1 ~]# rbd -p rbd ls

[root@OSKVM1 ~]# rbd -p volumes ls

volume-27717a88-3c80-420f-8887-4ca5c5b94023

volume-3bd22868-cb2a-4881-b9fb-ae91a6f79cb9

volume-b9cf7b94-cfb6-4b55-816c-10c442b23519

[root@OSKVM1 ~]# rbd -p images ls

9aee6c4e-3b60-49d5-8e17-33953e384a00

a8b45c8a-a5c8-49d8-a529-1e4088bdbf3f

[root@OSKVM1 ~]# rbd -p vms ls

[root@OSKVM1 ~]# rbd -p backup


*I could create cinder and  attach it to one of already built nova
instance.*

[root@OSKVM1 ceph]# nova volume-list

WARNING: Command volume-list is deprecated and will be removed after Nova
13.0.0 is released. Use python-cinderclient or openstackclient instead.

+--+---+--+--+-+--+

| ID   | Status| Display Name |
Size | Volume Type | Attached to  |

+--+---+--+--+-+--+

| 14a572d0-2834-40d6-9650-cb3e18271963 | available | nova-vol_gg  | 10
  | -   |  |

| 3bd22868-cb2a-4881-b9fb-ae91a6f79cb9 | in-use| nova-vol_1   | 2
  | -   | d06f7c3b-5bbd-4597-99ce-fa981d2e10db |

| 27717a88-3c80-420f-8887-4ca5c5b94023 | available | cinder-ceph-vol1 | 10
  | -   |  |

+--+---+--+--+-+--+

On Fri, Jul 8, 2016 at 11:33 AM, Gaurav Goyal 
wrote:

> Hi Kees,
>
> I regenerated the UUID as per your suggestion.
> Now i have same UUID in host1 and host2.
> I could create volumes and attach them to existing VMs.
>
> I could create new glance images.
>
> But still 

Re: [ceph-users] Question about how to start ceph OSDs with systemd

2016-07-08 Thread Tom Barron


On 07/08/2016 11:59 AM, Manuel Lausch wrote:
> hi,
> 
> In the last days I do play around with ceph jewel on debian Jessie and
> CentOS 7. Now I have a question about systemd on this Systems.
> 
> I installed ceph jewel (ceph version 10.2.2
> (45107e21c568dd033c2f0a3107dec8f0b0e58374)) on debian Jessie and
> prepared some OSDs. While playing around I decided to reinstall my
> operating system (of course without deleting the OSD devices ). After
> the reinstallation of ceph and put in the old ceph.conf I thought the
> previously prepared OSDs do easily start and all will be fine after that.
> 
> With debian Wheezy and ceph firefly this worked well, but with the new
> versions and systemd this doesn't work at all. Now what have I to do to
> get the OSDs running again?
> 
> The following command didn't work and I didn't get any output from it.
>   systemctl start ceph-osd.target
> 
> And this is the output from systemctl status ceph-osd.target
> ● ceph-osd.target - ceph target allowing to start/stop all
> ceph-osd@.service instances at once
>Loaded: loaded (/usr/lib/systemd/system/ceph-osd.target; enabled;
> vendor preset: enabled)
>Active: active since Fri 2016-07-08 17:19:29 CEST; 36min ago
> 
> Jul 08 17:19:29 cs-dellbrick01.server.lan systemd[1]: Reached target
> ceph target allowing to start/stop all ceph-osd@.service instances at once.
> Jul 08 17:19:29 cs-dellbrick01.server.lan systemd[1]: Starting ceph
> target allowing to start/stop all ceph-osd@.service instances at once.
> Jul 08 17:31:15 cs-dellbrick01.server.lan systemd[1]: Reached target
> ceph target allowing to start/stop all ceph-osd@.service instances at once.
> 
> 

Manuel,

You may find our changes to the devstack ceph plugin here [1] for
systemd vs upstart vs sysvinit control of ceph services helpful.  We
tested against xenial and fedora24 for the systemd paths.

Cheers,

-- Tom

[1] https://review.openstack.org/#/c/332484/

> 
> thanks,
> Manuel
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Question about how to start ceph OSDs with systemd

2016-07-08 Thread Manuel Lausch

hi,

In the last days I do play around with ceph jewel on debian Jessie and 
CentOS 7. Now I have a question about systemd on this Systems.


I installed ceph jewel (ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374)) on debian Jessie and 
prepared some OSDs. While playing around I decided to reinstall my 
operating system (of course without deleting the OSD devices ). After 
the reinstallation of ceph and put in the old ceph.conf I thought the 
previously prepared OSDs do easily start and all will be fine after that.


With debian Wheezy and ceph firefly this worked well, but with the new 
versions and systemd this doesn't work at all. Now what have I to do to 
get the OSDs running again?


The following command didn't work and I didn't get any output from it.
  systemctl start ceph-osd.target

And this is the output from systemctl status ceph-osd.target
● ceph-osd.target - ceph target allowing to start/stop all 
ceph-osd@.service instances at once
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd.target; enabled; 
vendor preset: enabled)

   Active: active since Fri 2016-07-08 17:19:29 CEST; 36min ago

Jul 08 17:19:29 cs-dellbrick01.server.lan systemd[1]: Reached target 
ceph target allowing to start/stop all ceph-osd@.service instances at once.
Jul 08 17:19:29 cs-dellbrick01.server.lan systemd[1]: Starting ceph 
target allowing to start/stop all ceph-osd@.service instances at once.
Jul 08 17:31:15 cs-dellbrick01.server.lan systemd[1]: Reached target 
ceph target allowing to start/stop all ceph-osd@.service instances at once.




thanks,
Manuel

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-08 Thread Gaurav Goyal
Hi Kees,

I regenerated the UUID as per your suggestion.
Now i have same UUID in host1 and host2.
I could create volumes and attach them to existing VMs.

I could create new glance images.

But still finding the same error while instance launch via GUI.


2016-07-08 11:23:25.067 86007 INFO nova.compute.resource_tracker
[req-4b7eccc8-0bf5-4f55-a941-4c93e97ef5df - - - - -] Auditing locally
available compute resources for node controller

2016-07-08 11:23:25.527 86007 INFO nova.compute.resource_tracker
[req-4b7eccc8-0bf5-4f55-a941-4c93e97ef5df - - - - -] Total usable vcpus:
40, total allocated vcpus: 0

2016-07-08 11:23:25.527 86007 INFO nova.compute.resource_tracker
[req-4b7eccc8-0bf5-4f55-a941-4c93e97ef5df - - - - -] Final resource view:
name=controller phys_ram=193168MB used_ram=1024MB phys_disk=8168GB
used_disk=1GB total_vcpus=40 used_vcpus=0 pci_stats=None

2016-07-08 11:23:25.560 86007 INFO nova.compute.resource_tracker
[req-4b7eccc8-0bf5-4f55-a941-4c93e97ef5df - - - - -] Compute_service record
updated for OSKVM1:controller

2016-07-08 11:24:25.065 86007 INFO nova.compute.resource_tracker
[req-4b7eccc8-0bf5-4f55-a941-4c93e97ef5df - - - - -] Auditing locally
available compute resources for node controller

2016-07-08 11:24:25.561 86007 INFO nova.compute.resource_tracker
[req-4b7eccc8-0bf5-4f55-a941-4c93e97ef5df - - - - -] Total usable vcpus:
40, total allocated vcpus: 0

2016-07-08 11:24:25.562 86007 INFO nova.compute.resource_tracker
[req-4b7eccc8-0bf5-4f55-a941-4c93e97ef5df - - - - -] Final resource view:
name=controller phys_ram=193168MB used_ram=1024MB phys_disk=8168GB
used_disk=1GB total_vcpus=40 used_vcpus=0 pci_stats=None

2016-07-08 11:24:25.603 86007 INFO nova.compute.resource_tracker
[req-4b7eccc8-0bf5-4f55-a941-4c93e97ef5df - - - - -] Compute_service record
updated for OSKVM1:controller

2016-07-08 11:25:18.138 86007 INFO nova.compute.manager
[req-3173f5b7-fa02-420c-954b-e21c3ce8d183 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] Starting instance...

2016-07-08 11:25:18.255 86007 INFO nova.compute.claims
[req-3173f5b7-fa02-420c-954b-e21c3ce8d183 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] Attempting claim: memory 512 MB, disk
1 GB

2016-07-08 11:25:18.255 86007 INFO nova.compute.claims
[req-3173f5b7-fa02-420c-954b-e21c3ce8d183 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] Total memory: 193168 MB, used:
1024.00 MB

2016-07-08 11:25:18.256 86007 INFO nova.compute.claims
[req-3173f5b7-fa02-420c-954b-e21c3ce8d183 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] memory limit: 289752.00 MB, free:
288728.00 MB

2016-07-08 11:25:18.256 86007 INFO nova.compute.claims
[req-3173f5b7-fa02-420c-954b-e21c3ce8d183 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] Total disk: 8168 GB, used: 1.00 GB

2016-07-08 11:25:18.257 86007 INFO nova.compute.claims
[req-3173f5b7-fa02-420c-954b-e21c3ce8d183 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] disk limit: 8168.00 GB, free: 8167.00
GB

2016-07-08 11:25:18.271 86007 INFO nova.compute.claims
[req-3173f5b7-fa02-420c-954b-e21c3ce8d183 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] Claim successful

2016-07-08 11:25:18.747 86007 INFO nova.virt.libvirt.driver
[req-3173f5b7-fa02-420c-954b-e21c3ce8d183 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] Creating image

2016-07-08 11:25:19.126 86007 ERROR nova.compute.manager
[req-3173f5b7-fa02-420c-954b-e21c3ce8d183 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] Instance failed to spawn

2016-07-08 11:25:19.126 86007 ERROR nova.compute.manager [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] Traceback (most recent call last):

2016-07-08 11:25:19.126 86007 ERROR nova.compute.manager [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2156, in
_build_resources

2016-07-08 11:25:19.126 86007 ERROR nova.compute.manager [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7] yield resources

2016-07-08 11:25:19.126 86007 ERROR nova.compute.manager [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2009, in
_build_and_run_instance

2016-07-08 11:25:19.126 86007 ERROR nova.compute.manager [instance:
bf4839c8-2af6-4959-9158-fe411e1cfae7]
block_device_info=block_device_info)

2016-07-08 11:25:19.126 86007 ERROR nova.compute.manager 

[ceph-users] Data recovery stuck

2016-07-08 Thread Pisal, Ranjit Dnyaneshwar
Hi All,

I am in process of adding new OSDs to Cluster however after adding second node 
Cluster recovery seems to be stopped.

Its more than 3 days but Objects degraded % has not improved even by 1%.

Will adding further OSDs help improve situation or is there any other way to 
improve recovery process?


[ceph@MYOPTPDN01 ~]$ ceph -s
cluster 9e3e9015-f626-4a44-83f7-0a939ef7ec02
 health HEALTH_WARN 315 pgs backfill; 23 pgs backfill_toofull; 3 pgs 
backfilling; 53 pgs degraded; 2 pgs recovering; 232 pgs recovery_wait; 552 pgs 
stuck unclean; recovery 3622384/90976826 objects degraded (3.982%); 1 near full 
osd(s)
 monmap e4: 5 mons at 
{MYOPTPDN01=10.115.1.136:6789/0,MYOPTPDN02=10.115.1.137:6789/0,MYOPTPDN03=10.115.1.138:6789/0,MYOPTPDN04=10.115.1.139:6789/0,MYOPTPDN05=10.115.1.140:6789/0},
 election epoch 6654, quorum 0,1,2,3,4 
MYOPTPDN01,MYOPTPDN02,MYOPTPDN03,MYOPTPDN04,MYOPTPDN05
 osdmap e198079: 171 osds: 171 up, 171 in
  pgmap v26428186: 5696 pgs, 4 pools, 105 TB data, 28526 kobjects
329 TB used, 136 TB / 466 TB avail
3622384/90976826 objects degraded (3.982%)
  23 active+remapped+wait_backfill+backfill_toofull
 120 active+recovery_wait+remapped
5144 active+clean
   1 active+recovering+remapped
 104 active+recovery_wait
  45 active+degraded+remapped+wait_backfill
   1 active+recovering
   3 active+remapped+backfilling
 247 active+remapped+wait_backfill
   8 active+recovery_wait+degraded+remapped
  client io 62143 kB/s rd, 100 MB/s wr, 14427 op/s
[ceph@MYOPTPDN01 ~]$

Best Regards,
Ranjit

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 5 pgs of 712 stuck in active+remapped

2016-07-08 Thread Nathanial Byrnes
Thanks for the pointer, I didn't know the answer, but now I do, and 
unfortunately, XenServer is relying on the kernel module. It's 
surprising that their latest release XenServer 7 which was released on 
the 6th of July is only using kernel 3.10 ... I guess since it is based 
upon CentOS 7 and that is still shipping with 3.10...  Maybe I can hack 
something together in userspace on the xen boxes that plays nicely with 
the XenAPI.


Thanks again Regards,
Nate


On 07/08/2016 09:40 AM, Micha Krause wrote:

Hi,

> Ah, Thanks Micha, That makes sense. I'll see if I can dig up another 
server to build on OSD server. Sadly, XenServer is not tolerant of new 
kernels.
> Do you happen to know if there is a dkms package of RBD anywhere? I 
might be able to build the latest RBD against the 3.10 kernel that 
comes with XenServer 7


I don't know, is XenServer really using the kernel-rbd, and not librbd?
Just want to make sure you aren't looking at the wrong thing to update.

Micha Krause

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds standby + standby-reply upgrade

2016-07-08 Thread Patrick Donnelly
Hi Dzianis,

On Thu, Jun 30, 2016 at 4:03 PM, Dzianis Kahanovich  wrote:
> Upgraded infernalis->jewel (git, Gentoo). Upgrade passed over global
> stop/restart everything oneshot.
>
> Infernalis: e5165: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 up:standby
>
> Now after upgrade start and next mon restart, active monitor falls with
> "assert(info.state == MDSMap::STATE_STANDBY)" (even without running mds) .

This is the first time you've upgraded your pool to jewel right?
Straight from 9.X to 10.2.2?

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-08 Thread Gaurav Goyal
Hi Kees,

Thanks for your help!

Node 1 controller + compute

-rw-r--r-- 1 root   root63 Jul  5 12:59 ceph.client.admin.keyring

-rw-r--r-- 1 glance glance  64 Jul  5 14:51 ceph.client.glance.keyring

-rw-r--r-- 1 cinder cinder  64 Jul  5 14:53 ceph.client.cinder.keyring

-rw-r--r-- 1 cinder cinder  71 Jul  5 14:54
ceph.client.cinder-backup.keyring

Node 2 compute2

-rw-r--r--  1 root root  63 Jul  5 12:59 ceph.client.admin.keyring

-rw-r--r--  1 root root  64 Jul  5 14:57 ceph.client.cinder.keyring

[root@OSKVM2 ceph]# chown cinder:cinder ceph.client.cinder.keyring

chown: invalid user: ‘cinder:cinder’


For below section, should i generate separate UUID for both compte hosts?

i executed uuidgen on host1 and put the same on second one. I need your
help to get rid of this problem.

Then, on the compute nodes, add the secret key to libvirt and remove the
temporary copy of the key:

uuidgen
457eb676-33da-42ec-9a8c-9293d545c337

cat > secret.xml <
  457eb676-33da-42ec-9a8c-9293d545c337
  
client.cinder secret
  

EOF
sudo virsh secret-define --file secret.xml
Secret 457eb676-33da-42ec-9a8c-9293d545c337 created
sudo virsh secret-set-value --secret
457eb676-33da-42ec-9a8c-9293d545c337 --base64 $(cat client.cinder.key)
&& rm client.cinder.key secret.xml

Moreover,  i do not find libvirtd group.

[root@OSKVM1 ceph]# chown qemu:libvirtd /var/run/ceph/guests/

chown: invalid group: ‘qemu:libvirtd’


Regards

Gaurav Goyal

On Fri, Jul 8, 2016 at 9:40 AM, Kees Meijs  wrote:

> Hi Gaurav,
>
> Have you distributed your Ceph authentication keys to your compute
> nodes? And, do they have the correct permissions in terms of Ceph?
>
> K.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-08 Thread Kees Meijs
Hi,

I'd recommend generating an UUID and use it for all your compute nodes.
This way, you can keep your configuration in libvirt constant.

Regards,
Kees

On 08-07-16 16:15, Gaurav Goyal wrote:
>
> For below section, should i generate separate UUID for both compte hosts? 
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 5 pgs of 712 stuck in active+remapped

2016-07-08 Thread Micha Krause

Hi,

> Ah, Thanks Micha, That makes sense. I'll see if I can dig up another server 
to build on OSD server. Sadly, XenServer is not tolerant of new kernels.
> Do you happen to know if there is a dkms package of RBD anywhere? I might be 
able to build the latest RBD against the 3.10 kernel that comes with XenServer 
7

I don't know, is XenServer really using the kernel-rbd, and not librbd?
Just want to make sure you aren't looking at the wrong thing to update.

Micha Krause
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-08 Thread Kees Meijs
Hi Gaurav,

Have you distributed your Ceph authentication keys to your compute
nodes? And, do they have the correct permissions in terms of Ceph?

K.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-08 Thread Gaurav Goyal
Hello,

Thanks i could restore my cinder service.

But while trying to launch an instance, i am getting same error.
Can you please help me to know what am i doing wrong?

2016-07-08 09:28:31.368 31909 INFO nova.compute.manager
[req-c56770a7-5bab-426b-b763-7473254c6410 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] Starting instance...

2016-07-08 09:28:31.484 31909 INFO nova.compute.claims
[req-c56770a7-5bab-426b-b763-7473254c6410 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] Attempting claim: memory 512 MB, disk
1 GB

2016-07-08 09:28:31.485 31909 INFO nova.compute.claims
[req-c56770a7-5bab-426b-b763-7473254c6410 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] Total memory: 193168 MB, used:
1024.00 MB

2016-07-08 09:28:31.485 31909 INFO nova.compute.claims
[req-c56770a7-5bab-426b-b763-7473254c6410 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] memory limit: 289752.00 MB, free:
288728.00 MB

2016-07-08 09:28:31.485 31909 INFO nova.compute.claims
[req-c56770a7-5bab-426b-b763-7473254c6410 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] Total disk: 8168 GB, used: 1.00 GB

2016-07-08 09:28:31.486 31909 INFO nova.compute.claims
[req-c56770a7-5bab-426b-b763-7473254c6410 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] disk limit: 8168.00 GB, free: 8167.00
GB

2016-07-08 09:28:31.503 31909 INFO nova.compute.claims
[req-c56770a7-5bab-426b-b763-7473254c6410 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] Claim successful

2016-07-08 09:28:31.985 31909 INFO nova.virt.libvirt.driver
[req-c56770a7-5bab-426b-b763-7473254c6410 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] Creating image

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager
[req-c56770a7-5bab-426b-b763-7473254c6410 289598890db341f4af45ce5c57c41ba3
713114f3b9e54501a35a79e84c1e6c9d - - -] [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] Instance failed to spawn

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] Traceback (most recent call last):

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2156, in
_build_resources

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] yield resources

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88]   File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2009, in
_build_and_run_instance

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88]
block_device_info=block_device_info)

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2527,
in spawn

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] admin_pass=admin_password)

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2953,
in _create_image

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] instance, size,
fallback_from_host)

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6406,
in _try_fetch_image_cache

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] size=size)

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line
240, in cache

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] *args, **kwargs)

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88]   File
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line
811, in create_image

2016-07-08 09:28:32.573 31909 ERROR nova.compute.manager [instance:
ded46ee2-9c8f-45f7-b29f-a2d6e0e08b88] prepare_template(target=base,
max_size=size, 

[ceph-users] Bad performance while deleting many small objects via radosgw S3

2016-07-08 Thread Martin Emrich
Hi!

Our little dev ceph cluster (nothing fancy; 3x1 OSD with 100GB each, 3x monitor 
with radosgw) takes over 20 minutes to delete ca. 44000 small objects (<1GB in 
total).
Deletion is done by listing objects in blocks of 1000 and then deleting them in 
one call for each block; each deletion of 1000 objects takes ca. 45s.

The monitor/radosgw hosts have a load of 0.03, the OSD hosts have only ca. 25% 
CPU usage, and ca. 5-10% iowait.

So nothing is really looking like a bottleneck.

Any Ideas on how to speed this up massively?

Pools:

# ceph osd pool ls  detail
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 557 flags hashpspool stripe_width 0
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 200 pgp_num 200 last_change 558 flags hashpspool stripe_width 0
pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 200 pgp_num 200 last_change 559 flags hashpspool 
stripe_width 0
pool 3 'default.rgw.data.root' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 200 pgp_num 200 last_change 560 flags hashpspool 
stripe_width 0
pool 4 'default.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 200 pgp_num 200 last_change 561 flags hashpspool 
stripe_width 0
pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 200 pgp_num 200 last_change 562 flags hashpspool 
stripe_width 0
pool 6 'default.rgw.users.uid' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 200 pgp_num 200 last_change 563 flags hashpspool 
stripe_width 0
pool 7 'default.rgw.users.keys' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 200 pgp_num 200 last_change 564 flags hashpspool 
stripe_width 0
pool 8 'default.rgw.meta' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 200 pgp_num 200 last_change 565 flags hashpspool 
stripe_width 0
pool 9 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 200 pgp_num 200 last_change 566 flags hashpspool 
stripe_width 0
pool 10 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 200 pgp_num 200 last_change 567 flags hashpspool 
stripe_width 0

Config:

[global]
fsid = cfaf0f4e-3b09-49e8-875b-4b114b0c4842
public_network = 0.0.0.0/0
mon_initial_members = ceph-kl-mon1
mon_host = 10.12.83.229, 10.12.81.212, 10.12.83.6
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
rgw zonegroup root pool = .rgw.root
osd pool default size = 2
osd pool default min size = 2
osd pool default pg num = 200
osd pool default pgp num = 200
mon_pg_warn_max_per_osd = 0
mon pg warn max object skew = 0

[osd]
osd op threads = 8
osd disk threads = 8
osd op queue = prio
osd recovery max active = 32
osd recovery threads = 4

[client.radosgw]
rgw zone = default
rgw zone root pool = .rgw.root
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/radosgw/client.radosgw.gateway.log
rgw print continue = false
rgw cache enabled = true
rgw cache lru size = 5
rgw num rados handles = 50
rgw num control oids = 16
rgw gc max objs = 1000
rgw exit timeout secs = 300

[client.radosgw.ceph-kl-mon1]
host = ceph-kl-mon1
rgw cache enabled = true
rgw cache lru size = 5
rgw num rados handles = 50
rgw num control oids = 16
rgw gc max objs = 1000
rgw exit timeout secs = 300

[client.radosgw.ceph-kl-mon2]
host = ceph-kl-mon2
rgw cache enabled = true
rgw cache lru size = 5
rgw num rados handles = 50
rgw num control oids = 16
rgw gc max objs = 1000
rgw exit timeout secs = 300

[client.radosgw.ceph-kl-mon3]
host = ceph-kl-mon3
rgw cache enabled = true
rgw cache lru size = 5
rgw num rados handles = 50
rgw num control oids = 16
rgw gc max objs = 1000
rgw exit timeout secs = 300


As you see, I already tried some tweaks to the radosgw config, but no positive 
effect.

Or is radosgw just not designed for this load (lots of really small objects)?

Thanks

Martin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-08 Thread John Spray
On Fri, Jul 8, 2016 at 8:01 AM, Goncalo Borges
 wrote:
> Hi Brad, Patrick, All...
>
> I think I've understood this second problem. In summary, it is memory
> related.
>
> This is how I found the source of the problem:
>
> 1./ I copied and adapted the user application to run in another cluster of
> ours. The idea was for me to understand the application and run it myself to
> collect logs and so on...
>
> 2./ Once I submit it to this other cluster, every thing went fine. I was
> hammering cephfs from multiple nodes without problems. This pointed to
> something different between the two clusters.
>
> 3./ I've started to look better to the segmentation fault message, and
> assuming that the names of the methods and functions do mean something, the
> log seems related to issues on the management of objects in cache. This
> pointed to a memory related problem.
>
> 4./ On the cluster where the application run successfully, machines have
> 48GB of RAM and 96GB of SWAP (don't know why we have such a large SWAP size,
> it is a legacy setup).
>
> # top
> top - 00:34:01 up 23 days, 22:21,  1 user,  load average: 12.06, 12.12,
> 10.40
> Tasks: 683 total,  13 running, 670 sleeping,   0 stopped,   0 zombie
> Cpu(s): 49.7%us,  0.6%sy,  0.0%ni, 49.7%id,  0.1%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Mem:  49409308k total, 29692548k used, 19716760k free,   433064k buffers
> Swap: 98301948k total,0k used, 98301948k free, 26742484k cached
>
> 5./ I have noticed that ceph-fuse (in 10.2.2) consumes about 1.5 GB of
> virtual memory when there is no applications using the filesystem.
>
>  7152 root  20   0 1108m  12m 5496 S  0.0  0.0   0:00.04 ceph-fuse
>
> When I only have one instance of the user application running, ceph-fuse (in
> 10.2.2) slowly rises with time up to 10 GB of memory usage.
>
> if I submit a large number of user applications simultaneously, ceph-fuse
> goes very fast to ~10GB.
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 18563 root  20   0 10.0g 328m 5724 S  4.0  0.7   1:38.00 ceph-fuse
>  4343 root  20   0 3131m 237m  12m S  0.0  0.5  28:24.56 dsm_om_connsvcd
>  5536 goncalo   20   0 1599m  99m  32m R 99.9  0.2  31:35.46 python
> 31427 goncalo   20   0 1597m  89m  20m R 99.9  0.2  31:35.88 python
> 20504 goncalo   20   0 1599m  89m  20m R 100.2  0.2  31:34.29 python
> 20508 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:34.20 python
>  4973 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:35.70 python
>  1331 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:35.72 python
> 20505 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:34.46 python
> 20507 goncalo   20   0 1599m  87m  20m R 99.9  0.2  31:34.37 python
> 28375 goncalo   20   0 1597m  86m  20m R 99.9  0.2  31:35.52 python
> 20503 goncalo   20   0 1597m  85m  20m R 100.2  0.2  31:34.09 python
> 20506 goncalo   20   0 1597m  84m  20m R 99.5  0.2  31:34.42 python
> 20502 goncalo   20   0 1597m  83m  20m R 99.9  0.2  31:34.32 python
>
> 6./ On the machines where the user had the segfault, we have 16 GB of RAM
> and 1GB of SWAP
>
> Mem:  16334244k total,  3590100k used, 12744144k free,   221364k buffers
> Swap:  1572860k total,10512k used,  1562348k free,  2937276k cached
>
> 7./ I think what is happening is that once the user submits his sets of
> jobs, the memory usage goes to the very limit on this type machine, and the
> raise is actually to fast that ceph-fuse segfaults before OOM Killer can
> kill it.
>
> 8./ We have run the user application in the same type of machines but with
> 64 GB of RAM and 1GB of SWAP, and everything goes fine also here.
>
>
> So, in conclusion, our second problem (besides the locks which was fixed by
> Pat patch) is the memory usage profile of ceph-fuse in 10.2.2 which seems to
> be very different than what it was in ceph-fuse 9.2.0.
>
> Are there any ideas how can we limit the virtual memory usage of ceph-fuse
> in 10.2.2?

The fuse client is designed to limit its cache sizes:
client_cache_size (default 16384) inodes of cached metadata
client_oc_size (default 200MB) bytes of cached data

We do run the fuse client with valgrind during testing, so it it is
showing memory leaks in normal usage on your system then that's news.

The top output you've posted seems to show that ceph-fuse only
actually has 328MB resident though?

If you can reproduce the memory growth, then it would be good to:
 * Try running ceph-fuse with valgrind --tool=memcheck to see if it's leaking
 * Inspect inode count (ceph daemon  status) to see if
it's obeying its limit
 * Enable objectcacher debug (debug objectcacher = 10) and look at the
output (from the "trim" lines) to see if it's obeying its limit
 * See if fuse_disable_pagecache setting makes a difference

Also, is the version of fuse the same on the nodes running 9.2.0 vs.
the nodes running 10.2.2?

John

> Cheers
> Goncalo
>
>
>
> On 07/08/2016 09:54 AM, Brad Hubbard wrote:
>
> Hi Goncalo,
>
> If possible it would be great 

Re: [ceph-users] ceph/daemon mon not working and status exit (1)

2016-07-08 Thread Daniel Gryniewicz

On 07/07/2016 08:06 PM, Rahul Talari wrote:

I am trying to use Ceph in Docker. I have built the ceph/base and
ceph/daemon DockeFiles. I am trying to deploy a Ceph monitor according
to the instructions given in the tutorial but when I execute the command
without KV store and type:

sudo docker ps

I am not able to keep the monitor up. What mistakes am I performing with
doing so? Is there something I should do to get it up and running
continuously without failing?

Thank you



Hi, Rahul.

There are several things you need, most important of which is ceph 
configuration and keys.  These are generated on the host and volume 
mounted into the container.


You can use the "docker logs" to get the output from the failed 
container to see what might be causing the issue.


Daniel

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Resize when booting from volume fails

2016-07-08 Thread mario martinez
Hi,

We are running Openstack Liberty with a Ceph Jewel backend for glance,
cinder, and nova.

Creating a new instance booting from volume works fine, but resizing this
fails with the following:

error opening image 9257fcc2-94b5-4c3f-950a-eadee03550a6_disk at snapshot
None, error code 500.

Full traceback is here: http://pastebin.com/hZkfZQ1A

Has anyone seen this problem, worked around it, or know if this is fixed in
Mitaka?

Thanks in advance,

Mario
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] repomd.xml: [Errno 14] HTTP Error 404 - Not Found on download.ceph.com for rhel7

2016-07-08 Thread Martin Palma
It seems that the packages "ceph-release-*.noarch.rpm" contain a
ceph.repo pointing to the baseurl
"http://ceph.com/rpm-hammer/rhel7/$basearch; which does not exist. It
should probably point to "http://ceph.com/rpm-hammer/el7/$basearch;.

- Martin

On Thu, Jul 7, 2016 at 5:57 PM, Martin Palma  wrote:
> Hi All,
>
> it seems that the "rhel7" folder/symlink on
> "download.ceph.com/rpm-hammer" does not exist anymore therefore
> ceph-deploy fails to deploy a new cluster. Just tested it by setting
> up a new lab environment.
>
> We have the same issue on our production cluster currently, which
> keeps us of updating it. Simple fix would be to change the url to
> "download.ceph.com/rpm-hammer/el7/..."  in the repo files I guess
>
> Any thoughts on that?
>
> We are running on CentOS 7.2.
>
> Best,
> Martin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 5 pgs of 712 stuck in active+remapped

2016-07-08 Thread Micha Krause

Hi,

As far as I know, this is exactly the problem why the new tunables where 
introduced, If you use 3 Replicas with only 3 hosts, crush sometimes doesn't 
find a solution to place all pgs.

If you are really stuck with bobtail turntables, I can think of 2 possible 
workarounds:

1. Add another osd Server.
2. Bad idea, but could work: build your crush rule manually, e.g.: set all 
primary pgs to host ceph1, first copy to host ceph2 and second copy to host3.

Micha Krause

Am 08.07.2016 um 05:47 schrieb Nathanial Byrnes:

Hello,
 I've got a Jewel Cluster (3 nodes, 15 OSD's) running with bobtail tunables 
(my xenserver cluster uses 3.10 as the kernel and there's no upgrading 
that). I started the cluster out on Hammer, upgraded to Jewel, discovered 
that optimal tunables would not work, and then set the tunables back to 
bobtail. Once the re-balancing completed, I was stuck with 1 pg in 
active+remapped. Repair didn't fix the pg.  I then upped the number of pgs from 
328 to 712 (oddly I asked for 512, but ended p with 712...), now I have 5 pgs 
stuck in active+remapped. I also tried re-weighting the pgs a couple times, but 
no change Here is my osd tree:

ID WEIGHT   TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 15.0 root default
-2  5.0 host ceph1
  0  1.0 osd.0   up  0.95001  1.0
  1  1.0 osd.1   up  1.0  1.0
  2  1.0 osd.2   up  1.0  1.0
  3  1.0 osd.3   up  0.90002  1.0
  4  1.0 osd.4   up  1.0  1.0
-3  5.0 host ceph3
10  1.0 osd.10  up  1.0  1.0
11  1.0 osd.11  up  1.0  1.0
12  1.0 osd.12  up  1.0  1.0
13  1.0 osd.13  up  1.0  1.0
14  1.0 osd.14  up  1.0  1.0
-4  5.0 host ceph2
  5  1.0 osd.5   up  1.0  1.0
  6  1.0 osd.6   up  1.0  1.0
  7  1.0 osd.7   up  1.0  1.0
  8  1.0 osd.8   up  1.0  1.0
  9  1.0 osd.9   up  1.0  1.0


 Any suggestions on how to troubleshoot or repair this?

 Thanks and Regards,
 Nate



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Watch Notify for snapshots

2016-07-08 Thread Nick Fisk
Thanks Jason,

I think I'm going to start with a bash script which SSH's into the machine to 
check if the process has finished writing and then calls the fsfreeze as I've 
got time constraints to getting this working. But I will definitely revisit 
this and see if there is something I can create which will do as you have 
described, as it would be a much neater solution.

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Jason Dillaman
> Sent: 08 July 2016 04:02
> To: n...@fisk.me.uk
> Cc: ceph-users 
> Subject: Re: [ceph-users] RBD Watch Notify for snapshots
> 
> librbd pseudo-automatically handles this by flushing the cache to the 
> snapshot when a new snapshot is created, but I don't think krbd
> does the same. If it doesn't, it would probably be a nice addition to the 
> block driver to support the general case.
> 
> Baring that (or if you want to involve something like fsfreeze), I think the 
> answer depends on how much you are willing to write some
> custom C/C++ code (I don't think the rados python library exposes 
> watch/notify APIs). A daemon could register a watch on a custom
> per-host/image/etc object which would sync the disk when a notification is 
> received. Prior to creating a snapshot, you would need to
> send a notification to this object to alert the daemon to sync/fsfreeze/etc.
> 
> On Thu, Jul 7, 2016 at 12:33 PM, Nick Fisk  wrote:
> Hi All,
> 
> I have a RBD mounted to a machine via the kernel client and I wish to be able 
> to take a snapshot and mount it to another machine
> where it can be backed up.
> 
> The big issue is that I need to make sure that the process writing on the 
> source machine is finished and the FS is sync'd before
> taking the snapshot.
> 
> My question. Is there something I can do with Watch/Notify to trigger this 
> checking/sync process on the source machine before the
> snapshot is actually taken?
> 
> Thanks,
> Nick
> 
> ___
> ceph-users mailing list
> mailto:ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> --
> Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] multiple journals on SSD

2016-07-08 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Zoltan Arnold Nagy
> Sent: 08 July 2016 08:51
> To: Christian Balzer 
> Cc: ceph-users ; n...@fisk.me.uk
> Subject: Re: [ceph-users] multiple journals on SSD
> 
> Hi Christian,
> 
> 
> On 08 Jul 2016, at 02:22, Christian Balzer  wrote:
> 
> 
> Hello,
> 
> On Thu, 7 Jul 2016 23:19:35 +0200 Zoltan Arnold Nagy wrote:
> 
> 
> Hi Nick,
> 
> How large NVMe drives are you running per 12 disks?
> 
> In my current setup I have 4xP3700 per 36 disks but I feel like I could
> get by with 2… Just looking for community experience :-)
> This is funny, because you ask Nick about the size and don't mention it
> yourself. ^o^
> 
> You are absolutely right, my bad. We are using the 400GB models.
> 
> 
> As I speculated in my reply, it's the 400GB model and Nick didn't dispute
> that.
> And I shall assume the same for you.
> 
> You could get by with 2 of the 400GB ones, but that depends on a number of
> things.
> 
> 1. What's your use case, typical usage pattern?
> Are you doing a lot of large sequential writes or is it mostly smallish
> I/Os?
> HDD OSDs will clock in at about 100MB/s with OSD bench, but realistically
> not see more than 50-60MB/s, so with 18 of them per one 400GB P3700 you're
> about on par.
> 
> Our usage varies so much that it’s hard to put a finger on it.
> Some days it’s this, some days it’s that. Internal cloud with att bunch of 
> researchers.

What I have seen is that where something like a SAS/SATA SSD will almost have a 
linear response of latency against load, NVME's start off with a shallower 
curve. You probably want to look at how high your current journals are getting 
hit. If they are much above 25-50% I would hesitate about putting too much more 
load on them for latency reasons, unless you are just going for big buffered 
write performance. You could probably drop down to maybe using 3 for every 12 
disks though?

This set of slides were very interesting when I was planning my latest nodes.

https://indico.cern.ch/event/320819/contributions/742938/attachments/618990/851639/SSD_Benchmarking_at_CERN__HEPiX_Fall_2014.pdf


> 
> 
> 
> 2. What's your network setup? If you have more than 20Gb/s to that node,
> your journals will likely become the (write) bottleneck.
> But that's only the case with backfills or again largish sequential writes
> of course.
> Currently it’s bonded (LACP) 2x10Gbit for both the front and backend, but 
> soon going to
> upgrade to 4x10Gbit front and 2x100Gbit back. (Already have a test cluster 
> with this setup).
> 
> 
> 3. A repeat of sorts of the previous 2 points, this time with the focus on
> endurance. How much data are you writing per day to an average OSD?
> With 18 OSDs per 400GB P3700 NVMe you will want that to be less than
> 223GB/day/OSD.
> 
> We’re growing at around 100TB/month spread over ~130 OSDs at the moment which 
> gives me ~25GB/OSD
> (I wish it would be that uniformly distributed :))
> 
> 
> 4. As usual, failure domains. In the case of a NVMe failure you'll loose
> twice the amount of OSDs.
> Right, but having a lot of nodes (20+) mitigates this somewhat.
> 
> 
> That all being said, at 36 OSDs I'd venture you'll run out of CPU steam
> (with small write IOPS) before your journals become the bottleneck.
> I agree, but that has not been the case so far.
> 
> 
> Christian
> 
> 
> Cheers,
> Zoltan
> [snip]
> 
> 
> --
> Christian BalzerNetwork/Systems Engineer
> mailto:ch...@gol.com  Global OnLine Japan/Rakuten Communications
> http://www.gol.com/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel Multisite RGW Memory Issues

2016-07-08 Thread Ben Agricola
So I've narrowed this down a bit further, I *think* this is happening
during bucket listing - I started a radosgw process with increased logging,
and killed it as soon as I saw the RSS jump. This was accompanied by a ton
of logs from 'RGWRados::cls_bucket_list' printing out the names of the
files in one of the buckets - probably 5000 lines total.

The OP of the request that generated the bucket list was was
'25RGWListBucket_ObjStore_S3', and appears to have been made by one of the
RGW nodes in the other site.

Any ideas?

Ben.


On Mon, 27 Jun 2016 at 10:47 Ben Agricola  wrote:

> Hi Pritha,
>
> Urgh, not sure what happened to the formatting there - let's try again.
>
> At the time, the 'primary' cluster (i.e. the one with the active data set)
> was receiving backup files from a small number of machines, prior to
> replication being enabled it was using ~10% RAM on the RadosGW boxes.
>
> Without replication enabled, neither cluster sees any spikes in memory
> usage under normal operation, with a slight increase when deep scrubbing
> (I'm monitoring cluster memory usage as a whole so OSD memory increases
> would account for that).
>
> Neither cluster was performing a deep scrub at the time. The 'secondary'
> cluster (i.e. the one I was trying to sync data to, which now has
> replication disabled again) has now had a RadosGW process running under
> normal load since June 17 with replication disabled and is using 1084M
> RSS. This matches with historical graphing for the primary cluster, which
> has hovered around 1G RSS for RadosGW processes for the last 6 months.
>
> I've just tested this out this morning and enabling replication caused all
> RadosGW processes to increase in memory usage (and continue increasing)
> from ~1000M RSS to ~20G RSS in about 2 minutes. As soon as replication is
> enabled (as in, within seconds) RSS of RadosGW on both clusters starts to
> increase and does not drop. This appears to happen during metadata sync
> as well as during normal data syncing.
>
>
> I then killed all RadosGW processes on the 'primary' side, and memory
> usage of the RadosGW processes on the 'secondary' side continue to increase
> in usage at the same rate. There are no further messages in the RadosGW
> log as this is occurring (since there is no client traffic and no further
> replication traffic). If I kill the active RadosGW processes then they
> start back up and normal memory usage resumes.
>
> Cheers,
>
> Ben.
>
>
> On Mon, 27 Jun 2016 at 10:39 Ben Agricola  wrote:
>
>> Hi Pritha,
>>
>>
>> At the time, the 'primary' cluster (i.e. the one with the active data set) 
>> was receiving backup files from a small number of machines, prior to 
>> replication being
>>
>> enabled it was using ~10% RAM on the RadosGW boxes.
>>
>>
>> Without replication enabled, neither cluster sees any spikes in memory usage 
>> under normal operation, with a slight increase when deep scrubbing (I'm 
>> monitoring
>>
>> cluster memory usage as a whole so OSD memory increases would account for 
>> that). Neither cluster was performing a deep scrub at the time. The 
>> 'secondary' cluster
>>
>> (i.e. the one I was trying to sync data to, which now has replication 
>> disabled again) has now had a RadosGW process running under normal load 
>> since June 17
>>
>> with replication disabled and is using 1084M RSS. This matches with 
>> historical graphing for the primary cluster, which has hovered around 1G RSS 
>> for RadosGW
>>
>> processes for the last 6 months.
>>
>>
>> I've just tested this out this morning and enabling replication caused all 
>> RadosGW processes to increase in memory usage (and continue increasing) from 
>> ~1000M RSS
>>
>> to ~20G RSS in about 2 minutes. As soon as replication is enabled (as in, 
>> within seconds) RSS of RadosGW on both clusters starts to increase and does 
>> not drop. This
>>
>> appears to happen during metadata sync as well as during normal data syncing 
>> as well.
>>
>>
>> I then killed all RadosGW processes on the 'primary' side, and memory usage 
>> of the RadosGW processes on the 'secondary' side continue to increase in 
>> usage at
>>
>> the same rate. There are no further messages in the RadosGW log as this is 
>> occurring (since there is no client traffic and no further replication 
>> traffic).
>>
>> If I kill the active RadosGW processes then they start back up and normal 
>> memory usage resumes.
>>
>> Cheers,
>>
>> Ben.
>>
>>
>> - Original Message -
>> > From: "Pritha Srivastava" > > >
>> > To: ceph-users@... 
>> > 
>> > Sent: Monday, June 27, 2016 07:32:23
>> > Subject: Re: [ceph-users] Jewel Multisite RGW Memory Issues
>>
>> > Do you know if the memory usage is high only during load from clients and 
>> > is
>> > steady 

Re: [ceph-users] multiple journals on SSD

2016-07-08 Thread Zoltan Arnold Nagy
Hi Christian,On 08 Jul 2016, at 02:22, Christian Balzer  wrote:Hello,On Thu, 7 Jul 2016 23:19:35 +0200 Zoltan Arnold Nagy wrote:Hi Nick,How large NVMe drives are you running per 12 disks?In my current setup I have 4xP3700 per 36 disks but I feel like I couldget by with 2… Just looking for community experience :-)This is funny, because you ask Nick about the size and don't mention ityourself. ^o^You are absolutely right, my bad. We are using the 400GB models.As I speculated in my reply, it's the 400GB model and Nick didn't disputethat.And I shall assume the same for you.You could get by with 2 of the 400GB ones, but that depends on a number ofthings.1. What's your use case, typical usage pattern?Are you doing a lot of large sequential writes or is it mostly smallishI/Os? HDD OSDs will clock in at about 100MB/s with OSD bench, but realisticallynot see more than 50-60MB/s, so with 18 of them per one 400GB P3700 you'reabout on par.Our usage varies so much that it’s hard to put a finger on it.Some days it’s this, some days it’s that. Internal cloud with att bunch of researchers.2. What's your network setup? If you have more than 20Gb/s to that node,your journals will likely become the (write) bottleneck. But that's only the case with backfills or again largish sequential writesof course.Currently it’s bonded (LACP) 2x10Gbit for both the front and backend, but soon going toupgrade to 4x10Gbit front and 2x100Gbit back. (Already have a test cluster with this setup).3. A repeat of sorts of the previous 2 points, this time with the focus onendurance. How much data are you writing per day to an average OSD?With 18 OSDs per 400GB P3700 NVMe you will want that to be less than223GB/day/OSD.We’re growing at around 100TB/month spread over ~130 OSDs at the moment which gives me ~25GB/OSD(I wish it would be that uniformly distributed :))4. As usual, failure domains. In the case of a NVMe failure you'll loosetwice the amount of OSDs.Right, but having a lot of nodes (20+) mitigates this somewhat.That all being said, at 36 OSDs I'd venture you'll run out of CPU steam(with small write IOPS) before your journals become the bottleneck.I agree, but that has not been the case so far.ChristianCheers,Zoltan[snip]-- Christian Balzer    Network/Systems Engineer    ch...@gol.com   	Global OnLine Japan/Rakuten Communicationshttp://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster upgrade

2016-07-08 Thread Kees Meijs
Thank you everyone, I just tested and verified the ruleset and applied
it so some pools. Worked like a charm!

K.

On 06-07-16 19:20, Bob R wrote:
> See http://dachary.org/?p=3189 for some simple instructions on testing
> your crush rule logic.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-08 Thread Fran Barrera
Hello,

You only need a create a pool and authentication in Ceph for cinder.

Your configuration should be like this (This is an example configuration
with Ceph Jewel and Openstack Mitaka):


[DEFAULT]
enabled_backends = ceph
[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
glance_api_version = 2
rbd_user = cinder
rbd_secret_uuid = c35bd3d8-ec12-2052-9672d-334824635616

And then, remove the cinder database, recreate and poblate with
"cinder-manage db sync". Finally restart the cinder services and everything
should work fine.


Regards,
Fran.

2016-07-08 8:18 GMT+02:00 Kees Meijs :

> Hi Gaurav,
>
> The following snippets should suffice (for Cinder, at least):
>
> [DEFAULT]
> enabled_backends=rbd
>
> [rbd]
> volume_driver = cinder.volume.drivers.rbd.RBDDriver
> rbd_pool = cinder-volumes
> rbd_ceph_conf = /etc/ceph/ceph.conf
> rbd_flatten_volume_from_snapshot = false
> rbd_max_clone_depth = 5
> rbd_store_chunk_size = 4
> rados_connect_timeout = -1
> rbd_user = cinder
> rbd_secret = REDACTED
>
> backup_driver = cinder.backup.drivers.ceph
> backup_ceph_conf = /etc/ceph/ceph.conf
> backup_ceph_user = cinder-backup
> backup_ceph_chunk_size = 134217728
> backup_ceph_pool = backups
> backup_ceph_stripe_unit = 0
> backup_ceph_stripe_count = 0
> restore_discard_excess_bytes = true
>
>
> Obviously you'd alter the directives according to your configuration
> and/or wishes.
>
> And no, creating RBD volumes by hand is not needed. Cinder will do this
> for you.
>
> K.
>
> On 08-07-16 04:14, Gaurav Goyal wrote:
>
> Yeah i didnt find additional section for [ceph] in my cinder.conf file.
> Should i create that manually?
> As i didnt find [ceph] section so i modified same parameters in [DEFAULT]
> section.
> I will change that as per your suggestion.
>
> Moreoevr checking some other links i got to know that, i must configure
> following additional parameters
> should i do that and install tgtadm package?
>
> rootwrap_config = /etc/cinder/rootwrap.confapi_paste_confg = 
> /etc/cinder/api-paste.iniiscsi_helper = tgtadmvolume_name_template = 
> volume-%svolume_group = cinder-volumes
>
> Do i need to execute following commands?
>
> "pvcreate /dev/rbd1" &"vgcreate cinder-volumes /dev/rbd1"
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-08 Thread Goncalo Borges

Hi Brad, Patrick, All...

I think I've understood this second problem. In summary, it is memory 
related.


This is how I found the source of the problem:

   1./ I copied and adapted the user application to run in another
   cluster of ours. The idea was for me to understand the application
   and run it myself to collect logs and so on...

   2./ Once I submit it to this other cluster, every thing went fine. I
   was hammering cephfs from multiple nodes without problems. This
   pointed to something different between the two clusters.

   3./ I've started to look better to the segmentation fault message,
   and assuming that the names of the methods and functions do mean
   something, the log seems related to issues on the management of
   objects in cache. This pointed to a memory related problem.

   4./ On the cluster where the application run successfully, machines
   have 48GB of RAM and 96GB of SWAP (don't know why we have such a
   large SWAP size, it is a legacy setup).

   # top
   top - 00:34:01 up 23 days, 22:21,  1 user,  load average: 12.06,
   12.12, 10.40
   Tasks: 683 total,  13 running, 670 sleeping,   0 stopped,   0 zombie
   Cpu(s): 49.7%us,  0.6%sy,  0.0%ni, 49.7%id,  0.1%wa,  0.0%hi,
   0.0%si,  0.0%st
   Mem:  49409308k total, 29692548k used, 19716760k free, 433064k
   buffers
   Swap: 98301948k total,0k used, 98301948k free, 26742484k
   cached

   5./ I have noticed that ceph-fuse (in 10.2.2) consumes about 1.5 GB
   of virtual memory when there is no applications using the filesystem.

 7152 root  20   0 1108m  12m 5496 S  0.0  0.0   0:00.04
   ceph-fuse

   When I only have one instance of the user application running,
   ceph-fuse (in 10.2.2) slowly rises with time up to 10 GB of memory
   usage.

   if I submit a large number of user applications simultaneously,
   ceph-fuse goes very fast to ~10GB.

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM TIME+ COMMAND
   18563 root  20   0 10.0g 328m 5724 S  4.0  0.7   1:38.00
   ceph-fuse
 4343 root  20   0 3131m 237m  12m S  0.0  0.5  28:24.56
   dsm_om_connsvcd
 5536 goncalo   20   0 1599m  99m  32m R 99.9  0.2  31:35.46
   python
   31427 goncalo   20   0 1597m  89m  20m R 99.9  0.2  31:35.88 python
   20504 goncalo   20   0 1599m  89m  20m R 100.2  0.2  31:34.29
   python
   20508 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:34.20 python
 4973 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:35.70
   python
 1331 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:35.72
   python
   20505 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:34.46 python
   20507 goncalo   20   0 1599m  87m  20m R 99.9  0.2  31:34.37 python
   28375 goncalo   20   0 1597m  86m  20m R 99.9  0.2  31:35.52 python
   20503 goncalo   20   0 1597m  85m  20m R 100.2  0.2  31:34.09
   python
   20506 goncalo   20   0 1597m  84m  20m R 99.5  0.2  31:34.42 python
   20502 goncalo   20   0 1597m  83m  20m R 99.9  0.2  31:34.32 python

   6./ On the machines where the user had the segfault, we have 16 GB
   of RAM and 1GB of SWAP

   Mem:  16334244k total,  3590100k used, 12744144k free, 221364k
   buffers
   Swap:  1572860k total,10512k used,  1562348k free, 2937276k
   cached

   7./ I think what is happening is that once the user submits his sets
   of jobs, the memory usage goes to the very limit on this type
   machine, and the raise is actually to fast that ceph-fuse segfaults
   before OOM Killer can kill it.

   8./ We have run the user application in the same type of machines
   but with 64 GB of RAM and 1GB of SWAP, and everything goes fine also
   here.


So, in conclusion, our second problem (besides the locks which was fixed 
by Pat patch) is the memory usage profile of ceph-fuse in 10.2.2 which 
seems to be very different than what it was in ceph-fuse 9.2.0.


Are there any ideas how can we limit the virtual memory usage of 
ceph-fuse in 10.2.2?


Cheers
Goncalo



On 07/08/2016 09:54 AM, Brad Hubbard wrote:

Hi Goncalo,

If possible it would be great if you could capture a core file for this with
full debugging symbols (preferably glibc debuginfo as well). How you do
that will depend on the ceph version and your OS but we can offfer help
if required I'm sure.

Once you have the core do the following.

$ gdb /path/to/ceph-fuse core.
(gdb) set pag off
(gdb) set log on
(gdb) thread apply all bt
(gdb) thread apply all bt full

Then quit gdb and you should find a file called gdb.txt in your
working directory.
If you could attach that file to http://tracker.ceph.com/issues/16610

Cheers,
Brad

On Fri, Jul 8, 2016 at 12:06 AM, Patrick Donnelly  wrote:

On Thu, Jul 7, 2016 at 2:01 AM, Goncalo Borges
 wrote:

Unfortunately, the other user application breaks ceph-fuse again (It is a
completely 

Re: [ceph-users] (no subject)

2016-07-08 Thread Kees Meijs
Hi Gaurav,

The following snippets should suffice (for Cinder, at least):
> [DEFAULT]
> enabled_backends=rbd
>
> [rbd]
> volume_driver = cinder.volume.drivers.rbd.RBDDriver
> rbd_pool = cinder-volumes
> rbd_ceph_conf = /etc/ceph/ceph.conf
> rbd_flatten_volume_from_snapshot = false
> rbd_max_clone_depth = 5
> rbd_store_chunk_size = 4
> rados_connect_timeout = -1
> rbd_user = cinder
> rbd_secret = REDACTED
>
> backup_driver = cinder.backup.drivers.ceph
> backup_ceph_conf = /etc/ceph/ceph.conf
> backup_ceph_user = cinder-backup
> backup_ceph_chunk_size = 134217728
> backup_ceph_pool = backups
> backup_ceph_stripe_unit = 0
> backup_ceph_stripe_count = 0
> restore_discard_excess_bytes = true

Obviously you'd alter the directives according to your configuration
and/or wishes.

And no, creating RBD volumes by hand is not needed. Cinder will do this
for you.

K.

On 08-07-16 04:14, Gaurav Goyal wrote:
> Yeah i didnt find additional section for [ceph] in my cinder.conf
> file. Should i create that manually? 
> As i didnt find [ceph] section so i modified same parameters in
> [DEFAULT] section.
> I will change that as per your suggestion.
>
> Moreoevr checking some other links i got to know that, i must
> configure following additional parameters
> should i do that and install tgtadm package?
> rootwrap_config = /etc/cinder/rootwrap.conf
> api_paste_confg = /etc/cinder/api-paste.ini
> iscsi_helper = tgtadm
> volume_name_template = volume-%s
> volume_group = cinder-volumes
> Do i need to execute following commands? 
> "pvcreate /dev/rbd1" &
> "vgcreate cinder-volumes /dev/rbd1" 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com