Re: [ceph-users] PGs stuck activating after adding new OSDs

2018-03-28 Thread Jakub Jaszewski
Hi Jon, can you reweight one OSD to default value and share outcome of "ceph osd df tree; ceph -s; ceph health detail" ? Recently I was adding new node, 12x 4TB, one disk at a time and faced activating+remapped state for few hours. Not sure but maybe that was caused by "osd_max_backfills" value

Re: [ceph-users] 1 mon unable to join the quorum

2018-03-28 Thread Brad Hubbard
Can you update with the result of the following commands from all of the MONs? # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok mon_status # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok quorum_status On Thu, Mar 29, 2018 at 3:11 PM, Gauvain Pocentek wrote: > Hello Ceph

[ceph-users] 1 mon unable to join the quorum

2018-03-28 Thread Gauvain Pocentek
Hello Ceph users, We are having a problem on a ceph cluster running Jewel: one of the mons left the quorum, and we have not been able to make it join again. The two other monitors are running just fine, but obviously we need this third one. The problem happened before Jewel, when the cluste

Re: [ceph-users] Random individual OSD failures with "connection refused reported by" another OSD?

2018-03-28 Thread Kjetil Joergensen
Hi, another possibility - the osd's "refusing connections" crashed, there's a window of time where connection attempts will fail with connection refused, in between osd died, the osd being re-started by upstart/systemd, and the OSD gets far enough into it's init process to start listening for new

Re: [ceph-users] session lost, hunting for new mon / session established : every 30s until unmount/remount

2018-03-28 Thread Jean-Charles Lopez
Hi, if I read you crrectly you have 3 MONs on each data center. This means that when the link goes down you will loose quorum making the cluster unavailable. If my perception is correct, you’d have to start a 7th MON somewhere else accessible from both sites for your cluster to maintain quorum

[ceph-users] session lost, hunting for new mon / session established : every 30s until unmount/remount

2018-03-28 Thread Nicolas Huillard
Hi all, I didn't find much information regarding this kernel client loop in the ML. Here are my observation, around which I'll try to investigate. My setup: * 2 datacenters connected using an IPsec tunnel configured for routing (2 subnets) * connection to the WAN using PPPoE and the pppd kernel m

Re: [ceph-users] Random individual OSD failures with "connection refused reported by" another OSD?

2018-03-28 Thread Andre Goree
On 2018/03/28 1:39 pm, Subhachandra Chandra wrote: We have seen similar behavior when there are network issues. AFAIK, the OSD is being reported down by an OSD that cannot reach it. But either another OSD that can reach it or the heartbeat between the OSD and the monitor declares it up. The OS

Re: [ceph-users] Random individual OSD failures with "connection refused reported by" another OSD?

2018-03-28 Thread Subhachandra Chandra
We have seen similar behavior when there are network issues. AFAIK, the OSD is being reported down by an OSD that cannot reach it. But either another OSD that can reach it or the heartbeat between the OSD and the monitor declares it up. The OSD "boot" message does not seem to indicate an actual OSD

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-03-28 Thread Alexandre DERUMIER
>>Can you also share `ceph daemon mds.2 cache status`, the full `ceph >>daemon mds.2 perf dump`, and `ceph status`? Sorry, too late, I needed to restart the mds daemon because I was out of memory :( Seem stable for now. (around 500mb) Not sure It was related, but I had a ganesha-nfs ->cephfs

[ceph-users] Random individual OSD failures with "connection refused reported by" another OSD?

2018-03-28 Thread Andre Goree
Hello, I've recently had a minor issue come up where random individual OSDs are failed due to a connection refused on another OSD. I say minor, bc it's not a node-wide issue, and appears to be random nodes -- and besides that, the OSD comes up within less than a second, as if the OSD is sent

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-28 Thread Max Cuttins
Hi Jason, i really don't want to stress this much than I already did. But I need to have a clear answer. Il 28/03/2018 13:36, Jason Dillaman ha scritto: But I don't think that CentOS7.5 will use the kernel 4.16 ... so you are telling me that new feature will be backported to the kernel 3.* ?

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread adrien.geor...@cc.in2p3.fr
Hmm looks like I restarted everything except MDS... So it's the same issue! That's why the MDS kill themselves during the reboot of one of the monitors with MDS in 12.2.2. Thanks Dan! Adrien Le 28/03/2018 à 16:43, Dan van der Ster a écrit : Do you have the startup banners for mds.cccephadm14

Re: [ceph-users] What do you use to benchmark your rgw?

2018-03-28 Thread Janne Johansson
2018-03-28 16:21 GMT+02:00 David Byte : > I use cosbench (the last rc works well enough). I can get multiple GB/s > from my 6 node cluster with 2 RGWs. > > > To add info to this, it's not unexpectedly low for us, we know the S3+https layer added latencies, and it is EC pools on cheap+large spin di

Re: [ceph-users] What do you use to benchmark your rgw?

2018-03-28 Thread Mark Nelson
Personally I usually use a modified version of Mark Seger's getput tool here: https://github.com/markhpc/getput/tree/wip-fix-timing The difference between this version and upstream is primarily to make getput more accurate/useful when using something like CBT for orchestration instead of the

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread Dan van der Ster
Do you have the startup banners for mds.cccephadm14 and 15? It sure looks like they were running 12.2.2 with the "not writeable with daemon features" error. -- dan On Wed, Mar 28, 2018 at 3:12 PM, adrien.geor...@cc.in2p3.fr wrote: > Hi, > > All Ceph services were in 12.2.4 version. > > Adrien >

Re: [ceph-users] What do you use to benchmark your rgw?

2018-03-28 Thread David Byte
I use cosbench (the last rc works well enough). I can get multiple GB/s from my 6 node cluster with 2 RGWs. David Byte Sr. Technical Strategist IHV Alliances and Embedded SUSE Sent from my iPhone. Typos are Apple's fault. On Mar 28, 2018, at 5:26 AM, Janne Johansson mailto:icepic...@gmail.com>

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread adrien.geor...@cc.in2p3.fr
Hi, All Ceph services were in 12.2.4 version. Adrien Le 28/03/2018 à 14:47, Dan van der Ster a écrit : Hi, Which versions were those MDS's before and after the restarted standby MDS? Cheers, Dan On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr wrote: Hi, I just had the same

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread Dan van der Ster
Hi, Which versions were those MDS's before and after the restarted standby MDS? Cheers, Dan On Wed, Mar 28, 2018 at 11:11 AM, adrien.geor...@cc.in2p3.fr wrote: > Hi, > > I just had the same issue with our 12.2.4 cluster but not during the > upgrade. > One of our 3 monitors restarted (the one

Re: [ceph-users] Fwd: High IOWait Issue

2018-03-28 Thread Alex Gorbachev
On Mon, Mar 26, 2018 at 11:40 PM, Sam Huracan wrote: > Hi, > > We are using Raid cache mode Writeback for SSD journal, I consider this is > reason of utilization of SSD journal is so low. > Is it true? Anybody has experience with this matter, plz confirm. > I turn the writeback mode off for th

Re: [ceph-users] What do you use to benchmark your rgw?

2018-03-28 Thread Janne Johansson
s3cmd and cli version of cyberduck to test it end-to-end using parallelism if possible. Getting some 100MB/s at most, from 500km distance over https against 5*radosgw behind HAProxy. 2018-03-28 11:17 GMT+02:00 Matthew Vernon : > Hi, > > What are people here using to benchmark their S3 service (

Re: [ceph-users] Getting a public file from radosgw

2018-03-28 Thread Marc Roos
Yes great!!! Thanks https://192.168.1.114:7480/test:test/test.txt -Original Message- From: Matt Benjamin [mailto:mbenj...@redhat.com] Sent: woensdag 28 maart 2018 14:23 To: Marc Roos Cc: ceph-users Subject: Re: [ceph-users] Getting a public file from radosgw Hi Marc, It looks to m

Re: [ceph-users] Getting a public file from radosgw

2018-03-28 Thread Matt Benjamin
Hi Marc, It looks to me, from the naming of the test users here, as if you are being guided by the information here: http://docs.ceph.com/docs/master/radosgw/multitenancy/ which I think is the right starting point. The distinction is that the "test" in "test$tester1" and "test2" in "test2$tes

Re: [ceph-users] Getting a public file from radosgw

2018-03-28 Thread Marc Roos
I created these users with radosgw-admin [ "test2$tester3", "test$tester1", "test$tester2" ] test$tester1 has created bucket test, and in this bucket file test.txt test$tester2 cannot create a bucket test test2$tester3 has created bucket test, and in this bucket file test.txt Ther

Re: [ceph-users] Getting a public file from radosgw

2018-03-28 Thread Wido den Hollander
On 03/28/2018 12:53 PM, Marc Roos wrote: > There must be in the wget/curl url some userid/unique identification > not? Otherwise it could be anybodies test bucket/file? Let's say you have 'objects.local' as hostname. You can then fetch the object: - http://test.objects.local/test.txt - http:/

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-28 Thread Jason Dillaman
On Wed, Mar 28, 2018 at 7:33 AM, Brad Hubbard wrote: > On Wed, Mar 28, 2018 at 6:53 PM, Max Cuttins wrote: >> Il 27/03/2018 13:46, Brad Hubbard ha scritto: >> >> >> >> On Tue, Mar 27, 2018 at 9:12 PM, Max Cuttins wrote: >>> >>> Hi Brad, >>> >>> that post was mine. I knew it quite well. >>> >

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-28 Thread Brad Hubbard
On Wed, Mar 28, 2018 at 6:53 PM, Max Cuttins wrote: > Il 27/03/2018 13:46, Brad Hubbard ha scritto: > > > > On Tue, Mar 27, 2018 at 9:12 PM, Max Cuttins wrote: >> >> Hi Brad, >> >> that post was mine. I knew it quite well. >> >> That Post was about confirm the fact that minimum requirements w

Re: [ceph-users] Getting a public file from radosgw

2018-03-28 Thread Marc Roos
There must be in the wget/curl url some userid/unique identification not? Otherwise it could be anybodies test bucket/file? [marc@os0 ~]$ s3cmd info s3://test/test.txt s3://test/test.txt (object): File size: 15 Last mod: Wed, 28 Mar 2018 10:49:00 GMT MIME type: text/plain Storage:

Re: [ceph-users] Getting a public file from radosgw

2018-03-28 Thread Wido den Hollander
On 03/28/2018 11:59 AM, Marc Roos wrote: > > > > > > Do you have maybe some pointers, or example? ;) > When you upload using s3cmd try using the -P flag, that will set the public-read ACL. Wido > This XML file does not appear to have any style information associated > with it. The doc

Re: [ceph-users] Getting a public file from radosgw

2018-03-28 Thread Marc Roos
Do you have maybe some pointers, or example? ;) This XML file does not appear to have any style information associated with it. The document tree is shown below. NoSuchBuckettest tx000bb-005abb6720-2d8dc8-default2d8 dc8-default-default -Original Message- Fr

[ceph-users] Multipart Failure SOLVED - Missing Pool not created automatically

2018-03-28 Thread Ingo Reimann
Hi all, i was able to track down the problem: our zone config contains a default-placement with a value for "data_extra_pool". This pool, e.g. "dev.rgw.buckets.non-ec" did not exists and could not be created automatically. Logs show: 2018-03-28 11:26:33.151533 7f569b30e700 1 -- 10.197.115.31:0/

[ceph-users] Getting a public file from radosgw

2018-03-28 Thread Marc Roos
Is it possible to get a file directly from a bucket without authenticating With something like wget https://radosgw.example.com/user/bucket/file or https://radosgw.example.com/uniqueid ___ ceph-users mailing list ceph-users@lists.ceph.com http://list

[ceph-users] Upgrading ceph and mapped rbds

2018-03-28 Thread Götz Reinicke
Hi, I bet I did read it somewhere already, but can’t remember where…. Our ceph 10.2. cluster is fin and healthy and I have a couple of rbds exported to some fileserver and a nfs server. The upgrade to V 12.2 documentation is clear regarding upgrading/restarting all MONs first, after that, the O

Re: [ceph-users] What is in the mon leveldb?

2018-03-28 Thread Wido den Hollander
On 03/28/2018 01:34 AM, Tracy Reed wrote: >> health: HEALTH_WARN >> recovery 1230/13361271 objects misplaced (0.009%) >> >> and no recovery is happening. I'm not sure why. This hasn't happened >> before. But the mon db had been growing since long before this >> circumstance. > > Hmm.

[ceph-users] What do you use to benchmark your rgw?

2018-03-28 Thread Matthew Vernon
Hi, What are people here using to benchmark their S3 service (i.e. the rgw)? rados bench is great for some things, but doesn't tell me about what performance I can get from my rgws. It seems that there used to be rest-bench, but that isn't in Jewel AFAICT; I had a bit of a look at cosbench but it

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-28 Thread adrien.geor...@cc.in2p3.fr
Hi, I just had the same issue with our 12.2.4 cluster but not during the upgrade. One of our 3 monitors restarted (the one with a standby MDS) and the 2 others active MDS killed themselves : 2018-03-28 09:36:24.376888 7f910bc0f700  0 mds.cccephadm14 handle_mds_map mdsmap compatset compat={},

Re: [ceph-users] where is it possible download CentOS 7.5

2018-03-28 Thread Max Cuttins
Il 27/03/2018 13:46, Brad Hubbard ha scritto: On Tue, Mar 27, 2018 at 9:12 PM, Max Cuttins > wrote: Hi Brad,     that post was mine. I knew it quite well. That Post was about confirm the fact that minimum requirements written in the documentation r

Re: [ceph-users] MDS Bug/Problem

2018-03-28 Thread Perrin, Christopher (zimkop1)
Hi It is Possible that I have extracted the wrong log message. I will look into that. What happened is that out of the blue all MDSs started failing. Only after many failed stating attempts with the OSDs blocking "old" messages I reset the journal. After the MDSs where running again we had sever

Re: [ceph-users] Radosgw ldap info

2018-03-28 Thread Marc Roos
I think I have something wrong in my ldap setup, I cannot radosgw-admin user info --uid ldap users. So I have to fix this first. -Original Message- From: Benjeman Meekhof [mailto:bmeek...@umich.edu] Sent: maandag 26 maart 2018 18:17 To: ceph-users Subject: Re: [ceph-users] Radosgw l

Re: [ceph-users] Group-based permissions issue when using ACLs on CephFS

2018-03-28 Thread Yan, Zheng
On Tue, Mar 27, 2018 at 12:16 AM, Josh Haft wrote: > Here's what I'm seeing using basic owner/group permissions. Both > directories are mounted on my NFS client with the same options. Only > difference is underneath, from the NFS server, 'aclsupport' is mounted > via ceph-fuse with fuse_default_pe