[ceph-users] Give up on backfill, remove slow OSD

2016-09-22 Thread Iain Buclaw
Hi, I currently have an OSD that has been backfilling data off it for a little over two days now, and it's gone from approximately 68 PGs to 63. As data is still being read from, and written to it by clients whilst I'm trying to get it out of the cluster, this is not helping it at all. I figured

Re: [ceph-users] swiftclient call radosgw, it always response 401 Unauthorized

2016-09-22 Thread Brian Chang-Chien
hi Radoslaw Zarzynski now i retry the http://docs.ceph.com/docs/jewel/install/install-ceph- gateway/ and can get radosgw log as following, the result still 401 , it show got expired token: admin:admin expired i highlight inline.. 2016-09-22 17:09:40.975268 7f8292ffd700 10 content=MIIBzgYJKoZ

Re: [ceph-users] rgw bucket index manual copy

2016-09-22 Thread Василий Ангапов
Can I make existing bucket blind? 2016-09-22 4:23 GMT+03:00 Stas Starikevich : > Ben, > > Works fine as far as I see: > > [root@273aa9f2ee9f /]# s3cmd mb s3://test > Bucket 's3://test/' created > > [root@273aa9f2ee9f /]# s3cmd put /etc/hosts s3://test > upload: '/etc/hosts' -> 's3://test/hosts' [

Re: [ceph-users] Snap delete performance impact

2016-09-22 Thread Nick Fisk
Hi Adrian, I have also hit this recently and have since increased the osd_snap_trim_sleep to try and stop this from happening again. However, I haven't had an opportunity to actually try and break it again yet, but your mail seems to suggest it might not be the silver bullet I was looking for.

Re: [ceph-users] Snap delete performance impact

2016-09-22 Thread Adrian Saul
I tried 2 this afternoon and saw the same results. Essentially the disks appear to go to 100% busy doing very small but high numbers of IO and incur massive service times (300-400ms). During that period I get blocked request errors continually. I suspect part of that might be the SATA serve

Re: [ceph-users] swiftclient call radosgw, it always response 401 Unauthorized

2016-09-22 Thread Brian Chang-Chien
hi Radoslaw Zarzynski After i retry mamy times, I find when swift generate keystone user to radosgw first time the message will dump 2016-09-22 17:10:36.077496 7f8104ff9700 10 host=10.62.9.140 2016-09-22 17:10:36.077503 7f8104ff9700 20 subdomain= domain= in_hosted_domain=0 in_hosted_domain_s

Re: [ceph-users] rgw multi-site replication issues

2016-09-22 Thread Orit Wasserman
Hi John, Can you provide your zonegroup and zones configurations on all 3 rgw? (run the commands on each rgw) Thanks, Orit On Wed, Sep 21, 2016 at 11:14 PM, John Rowe wrote: > Hello, > > We have 2 Ceph clusters running in two separate data centers, each one with > 3 mons, 3 rgws, and 5 osds. I a

Re: [ceph-users] radosgw bucket name performance

2016-09-22 Thread Василий Ангапов
Stas, Are you talking about Ceph or AWS? 2016-09-22 4:31 GMT+03:00 Stas Starikevich : > Felix, > > According to my tests there is difference in performance between usual named > buckets (test, test01, test02), uuid-named buckets (like > '7c9e4a81-df86-4c9d-a681-3a570de109db') or just date ('2016-

Re: [ceph-users] Object lost

2016-09-22 Thread Fran Barrera
Hi Jason, I've followed your steps and now I can list all available data blocks of my image, but I don't know how rebuild a sparse image, I found this script ( https://raw.githubusercontent.com/smmoore/ceph/master/rbd_restore.sh) and https://www.sebastien-han.fr/blog/2015/01/29/ceph-recover-a-rbd-

Re: [ceph-users] rgw bucket index manual copy

2016-09-22 Thread Василий Ангапов
And how can I make ordinary and blind buckets coexist in one Ceph cluster? 2016-09-22 11:57 GMT+03:00 Василий Ангапов : > Can I make existing bucket blind? > > 2016-09-22 4:23 GMT+03:00 Stas Starikevich : >> Ben, >> >> Works fine as far as I see: >> >> [root@273aa9f2ee9f /]# s3cmd mb s3://test >>

[ceph-users] Ceph on different OS version

2016-09-22 Thread Matteo Dacrema
Hi, someone have ever tried to run a ceph cluster on two different version of the OS? In particular I’m running a ceph cluster half on Ubuntu 12.04 and half on Ubuntu 14.04 with Firefly version. I’m not seeing any issues. Are there some kind of risks? Thanks Matteo This email and any files tra

Re: [ceph-users] Object lost

2016-09-22 Thread Jason Dillaman
You can do something like the following: # create a sparse file the size of your image $ dd if=/dev/zero of=rbd_export bs=1 count=0 seek= # import the data blocks $ POOL=images $ PREFIX=rbd_data.1014109cf92e $ BLOCK_SIZE=512 $ for x in $(rados --pool ${POOL} ls | grep ${PREFIX} | sort) ; do rm

Re: [ceph-users] radosgw bucket name performance

2016-09-22 Thread Stas Starikevich
Hi all, Sorry, I made typo in the previous message. According to my tests is _no_ difference in Ceph RadosGW performance between those type of bucket names. Thanks. Stas On Thu, Sep 22, 2016 at 5:25 AM, Василий Ангапов wrote: > Stas, > > Are you talking about Ceph or AWS? > > 2016-09-22 4:31 G

Re: [ceph-users] rgw bucket index manual copy

2016-09-22 Thread Stas Starikevich
Hi, >> Can I make existing bucket blind? I didn't found a way to do that. >> And how can I make ordinary and blind buckets coexist in one Ceph cluster? The only way I see now - change configuration\restart services\create new bucket and roll back. Maybe someone from Ceph developers can add som

Re: [ceph-users] [EXTERNAL] Upgrading 0.94.6 -> 0.94.9 saturating mon node networking

2016-09-22 Thread Will . Boege
Just went through this upgrading a ~400 OSD cluster. I was in the EXACT spot you were in. The faster you can get all OSDs to the same version as the MONs the better. We decided to power forward and the performance got better for every OSD node we patched. Additionally I also discovered your Le

Re: [ceph-users] Ceph on different OS version

2016-09-22 Thread Lenz Grimmer
Hi, On 09/22/2016 03:03 PM, Matteo Dacrema wrote: > someone have ever tried to run a ceph cluster on two different version > of the OS? > In particular I’m running a ceph cluster half on Ubuntu 12.04 and half > on Ubuntu 14.04 with Firefly version. > I’m not seeing any issues. > Are there some ki

Re: [ceph-users] Ceph on different OS version

2016-09-22 Thread Matteo Dacrema
To be more precise, the node with different OS are only the OSD nodes. Thanks Matteo This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the s

Re: [ceph-users] Ceph on different OS version

2016-09-22 Thread Wido den Hollander
> Op 22 september 2016 om 16:13 schreef Matteo Dacrema : > > > To be more precise, the node with different OS are only the OSD nodes. > I haven't seen real issues, but a few which I could think of which *potentially* might be a problem: - Different tcmalloc version - Different libc versions

[ceph-users] Stuck at "Setting up ceph-osd (10.2.3-1~bpo80+1)"

2016-09-22 Thread Chris Murray
Hi all, Might anyone be able to help me troubleshoot an "apt-get dist-upgrade" which is stuck at "Setting up ceph-osd (10.2.3-1~bpo80+1)"? I'm upgrading from 10.2.2. The two OSDs on this node are up, and think they are version 10.2.3, but the upgrade doesn't appear to be finishing ... ? Thank yo

Re: [ceph-users] rgw multi-site replication issues

2016-09-22 Thread John Rowe
Hello Orit, thanks. I will do all 6 just in case. Also as an FYI I originally had all 6 as endpoint (3 in each zone) but have it down to just the two "1" servers talking to each other until I can get it working. Eventually I would like to have all 6 cross connecting again. *rgw-primary-1:* radosg

[ceph-users] too many PGs per OSD when pg_num = 256??

2016-09-22 Thread Andrus, Brian Contractor
All, I am getting a warning: health HEALTH_WARN too many PGs per OSD (377 > max 300) pool cephfs_data has many more objects per pg than average (too few pgs?) yet, when I check the settings: # ceph osd pool get rbd pg_num pg_num: 256 # ceph osd pool get rbd pgp_num

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-22 Thread Ilya Dryomov
On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov wrote: > On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov wrote: >> >> [snipped] >> >> cat /sys/bus/rbd/devices/47/client_id >> client157729 >> cat /sys/bus/rbd/devices/1/client_id >> client157729 >> >> Client client157729 is alxc13, based on correlat

Re: [ceph-users] too many PGs per OSD when pg_num = 256??

2016-09-22 Thread David Turner
How many pools do you have? How many pgs does your total cluster have, not just your rbd pool? ceph osd lspools ceph -s | grep -Eo '[0-9] pgs' My guess is that you have other pools with pgs and the cumulative total of pgs per osd is too many. [cid:image07e234

Re: [ceph-users] too many PGs per OSD when pg_num = 256??

2016-09-22 Thread David Turner
Forgot the + for the regex. ceph -s | grep -Eo '[0-9]+ pgs' [cid:imagef66bfa.JPG@59a896f8.4fb4801c] David Turner | Cloud Operations Engineer | StorageCraft Technology Corporation 380 Data Drive Suite 300

Re: [ceph-users] too many PGs per OSD when pg_num = 256??

2016-09-22 Thread Andrus, Brian Contractor
David, I have 15 pools: # ceph osd lspools|sed 's/,/\n/g' 0 rbd 1 cephfs_data 2 cephfs_metadata 3 vmimages 14 .rgw.root 15 default.rgw.control 16 default.rgw.data.root 17 default.rgw.gc 18 default.rgw.log 19 default.rgw.users.uid 20 default.rgw.users.keys 21 default.rgw.users.email 22 default.rgw.m

Re: [ceph-users] too many PGs per OSD when pg_num = 256??

2016-09-22 Thread David Turner
So you have 3,520 pgs. Assuming all of your pools are using 3 replicas, and using the 377 pgs/osd in your health_warn state, that would mean your cluster has 28 osds. When you calculate how many pgs a pool should have, you need to account for how many osds you have, how much percentage of data

Re: [ceph-users] too many PGs per OSD when pg_num = 256??

2016-09-22 Thread Andrus, Brian Contractor
Hmm. Something happened then. I only have 20 OSDs. What may cause that? Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238 From: David Turner [mailto:david.tur...@storagecraft.com] Sent: Thursday, September 22, 2016 10:04 AM To: Andrus, Bria

Re: [ceph-users] too many PGs per OSD when pg_num = 256??

2016-09-22 Thread David Turner
Nothing weird, I have incomplete data and data based on bad rounding errors. Your cluster has too many pgs and most of your pools will likely need to be recreated with less. Have you poked around on the pg calc tool? [cid:image7765b4.JPG@f8b09f63.48b63161]

Re: [ceph-users] [EXTERNAL] Re: jewel blocked requests

2016-09-22 Thread WRIGHT, JON R (JON R)
Thanks for responding, and I apologize for taking so long to get back to you. But all the osds are ceph version 10.2.2 The osd with the bad disk is down and out of the cluster. The disk may get replaced today. I'm still getting blocked request messages, but at a significantly lower rate.

[ceph-users] Question on RGW MULTISITE and librados

2016-09-22 Thread Paul Nimbley
Fairly new to ceph so please excuse any misused terminology. We're currently exploring the use of ceph as a replacement storage backend for an existing application. The existing application has 2 requirements which seemingly can be met individually by using librados and the Ceph Object Gateway

[ceph-users] rbd pool:replica size choose: 2 vs 3

2016-09-22 Thread Zhongyan Gu
Hi there, the default rbd pool replica size is 3. However, I found that in our all ssd environment, capacity become a cost issue. We want to save more capacity. So one option is change the replica size from 3 to 2. anyone can share the experience of pros vs cons regarding replica size 2 vs 3? than

[ceph-users] Ceph repo is broken, no repodata at all

2016-09-22 Thread Chengwei Yang
Hi list, I found that ceph repo is broken these days, no any repodata in the repo at all. http://us-east.ceph.com/rpm-jewel/el7/x86_64/repodata/ it's just empty, so how can I install ceph rpms from yum? A workaround is I synced all the rpms to local and create repodata with createrepo command,