Re: [ceph-users] how to update old pre ceph-deploy osds to current systemd way?

2018-01-17 Thread David Turner
The partition type code for a data partition is
4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D
and the partition type code for a journal is
45B0969E-9B03-4F30-B4C6-5EC00CEFF106.
That will fix your udev rules and probably make systemctl recognize them.
What you need to do to remove the entries from your ceph.conf is to get
them mounting in their default location. That is /var/lib/ceph/osd/ceph-#/
and make sure that the journal symlink in your osds points to the proper
journal device. Usually try to do this by pointing it towards something
that will persist on reboots like /dev/disk/by-uuid/{guid}.

On Wed, Jan 17, 2018 at 11:22 AM Smith, Eric  wrote:

> We had to update the OS / kernel, chown all the data to ceph:ceph, and
> update the partition type codes on both the OSDs and journals. After this
> udev and systemd brought them up automatically.
>
>
> --
> *From:* ceph-users  on behalf of
> Stefan Priebe - Profihost AG 
> *Sent:* Wednesday, January 17, 2018 10:45 AM
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] how to update old pre ceph-deploy osds to current
> systemd way?
>
> Hello,
>
> i've some osds which were created under bobtail or argonaut (pre
> ceph-deploy).
>
> Those are not recognized as a ceph-osd@57.service . Also they have an
> entry in the ceph.conf:
>
> [osd.12]
> host=1336
> osd_data = /ceph/osd.$id/
> osd_journal = /dev/disk/by-partlabel/journal$id
>
> Is there any way to migrate them?
>
> Greets,
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ceph-users Info Page
> 
> lists.ceph.com
> To see the collection of prior postings to the list, visit the ceph-users
> Archives. Using ceph-users: To post a message to all the list members, send
> ...
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] data_digest_mismatch_oi with missing object and I/O errors (repaired!)

2018-01-17 Thread Brian Andrus
We recently had a few inconsistent PGs crop up on one of our clusters, and
I wanted to describe the process used to repair them for review and perhaps
to help someone in the future.

Our state roughly matched David's described comment here:

http://tracker.ceph.com/issues/21388#note-1

However, we were missing the object entirely on the primary OSD. This may
have been due to previous manual repair attempts, but the exact cause of
the missing object is unclear.

In order to get the PG into a state consistent with David's comment, I
exported the perceived "good" copy of the PG using ceph-objectstore-tool
and imported it to the primary OSD.

At this point, a repair would consistently cause an empty listing in "rados
list-inconsistent-obj" (but still inconsistent), and a deep-scrub would
cause the "list-inconsistent-obj" state to appear as David described.
However, "rados get" resulted in I/O errors.

I again used ceph-objectstore-tool with the "get-bytes" option to dump the
object contents to a file and "rados put" that.

It seemed to work and the customer's VM hasn't noticed anything awry yet...
but then again it wasn't prior to this either. Seems the right data is in
place and the PG is consistent after a deep-scrub.

Pretty standard stuff, but might help with alternative ways of dumping byte
data in the future as long as others don't see an issue with this. I see at
least one other with the same I/O error on the bug.

--
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.and...@dreamhost.com | www.dreamhost.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS injectargs

2018-01-17 Thread David Turner
The danger with mds.* and osd.* in commands is that the * is attempted to
be expanded by the command line before passing it to the command.  That is
to say that if you have a file or folder called mds.anything, then you're
passing mds.anything to the command instead of mds.*.  What you can do to
mitigate that is run it with mds.\* to make sure that the command line
doesn't mess up the wildcard character.

On Wed, Jan 17, 2018 at 7:30 AM Eugen Block  wrote:

> Ah, I think I got it:
>
> ceph@host1:~> ceph tell mds.* config set mds_cache_memory_limit 204800
> <(204)%20800->
> [...]
> mds.host2: Set mds_cache_memory_limit to 204800 <(204)%20800->
> [...]
> mds.host1: Set mds_cache_memory_limit to 204800 <(204)%20800->
>
> :-)
>
>
> Zitat von Florent B :
>
> > Of course, specifying mds.NAME is working (without *), but there's no
> > way to do it on all MDS at the same time ?
> >
> >
> > On 17/01/2018 12:54, Florent B wrote:
> >> That's what I did, I run it on the active MDS server.
> >>
> >> I run 12.2.2 version.
> >>
> >>
> >> On 17/01/2018 12:53, Eugen Block wrote:
> >>> Can you try it on one of your MDS servers? It should work there.
> >>>
> >>> Zitat von Florent B :
> >>>
>  Hi,
> 
>  Thank you but I got :
> 
>  admin_socket: exception getting command descriptions: [Errno 2] No
>  such file or directory
> 
> 
>  On 17/01/2018 12:47, Eugen Block wrote:
> > Hi,
> >
> > try it with
> >
> > ceph daemon mds.* config set mds_cache_size 0
> >
> > Regards,
> > Eugen
> >
> >
> > Zitat von Florent B :
> >
> >> Hi,
> >>
> >> I would like to reset "mds_cache_size" to its default value (0 in
> >> Luminous).
> >>
> >> So I do :
> >>
> >> ceph tell mds.* injectargs '--mds_cache_size 0'
> >>
> >> I also tried :
> >>
> >> ceph tell mds.* injectargs '--mds_cache_size = 0'
> >>
> >> ceph tell mds.* injectargs '--mds_cache_size=0'
> >>
> >> But I always have errors like these for each MDS :
> >>
> >> 2018-01-17 12:39:06.134654 7fdc557fa700  0 client.475461118
> >> ms_handle_reset on 10.111.0.3:6800/2482597610
> >> Error EPERM: problem getting command descriptions from mds.host3
> >> mds.host3: problem getting command descriptions from mds.host3
> >>
> >> My client.admin keyring is :
> >>
> >> client.admin
> >> key: X
> >> auid: 0
> >> caps: [mds] allow
> >> caps: [mgr] allow *
> >> caps: [mon] allow *
> >> caps: [osd] allow *
> >>
> >> How can I change this value without restarting services ?
> >>
> >> Thank you.
> >>
> >> Florent
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>  ___
>  ceph-users mailing list
>  ceph-users@lists.ceph.com
>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Eugen Block voice   : +49-40-559 51 75
> <+49%2040%205595175>
> NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
> <+49%2040%205595177>
> Postfach 61 03 15
> D-22423 Hamburg e-mail  : ebl...@nde.ag
>
>  Vorsitzende des Aufsichtsrates: Angelika Mozdzen
>Sitz und Registergericht: Hamburg, HRB 90934
>Vorstand: Jens-U. Mozdzen
> USt-IdNr. DE 814 013 983
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hiding stripped objects from view

2018-01-17 Thread Marc Roos
 

Is there a way to hide the stripped objects from view? Sort of with the 
rbd type pool

[@c01 mnt]# rados ls -p ec21 | head
test2G.img.0023
test2G.img.011c
test2G.img.0028
test2G.img.0163
test2G.img.01e7
test2G.img.008d
test2G.img.0129
test2G.img.0150
test2G.img.010e
test2G.img.014b
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error message in the logs: "meta sync: ERROR: failed to read mdlog info with (2) No such file or directory"

2018-01-17 Thread Casey Bodley


On 01/15/2018 09:57 AM, Victor Flávio wrote:

Hello,

We've have a radosgw cluster(verion 12.2.2) in multisite mode. Our 
cluster is formed by one master realm, with one master zonegroup and 
two zones(which one is the master zone).


We've followed the instructions of Ceph documentation to install and 
configure our cluster.


The cluster works as expected, the objects and users are being 
replicated between the zones, but we always are getting this error 
message in our logs:



2018-01-15 12:25:00.119301 7f68868e5700  1 meta sync: ERROR: failed to 
read mdlog info with (2) No such file or directory



Some details about the errors message(s):
 - They are only printed in the non-master zone log;
 - They are only printed when this "slave" zone try to sync the 
metadata info;
 - In each synchronization cycle of the metadata info, the number of 
this errors messages equals to the number of shards of metadata logs;
 - When we run the command "rados-admin mdlogs list", we've got a 
empty array as output in both zones;
 - The output of "rados-admin sync status" says every is ok and 
synced, which is true, despite the mdlog error messages in log.


Anyone got this same problem? And how to fix it. I've tried and failed 
to many times to fix it.



--
Victor Flávio de Oliveira Santos
Fullstack Developer/DevOps
http://victorflavio.me
Twitter: @victorflavio
Skype: victorflavio.oliveira
Github: https://github.com/victorflavio
Telefone/Phone: +55 62 81616477



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Victor,

We use a hashing strategy to spread metadata over these mdlog shards. 
It's likely that some shards are empty, especially if there are 
relatively few buckets/users in the system. These 'No such file or 
directory' errors are just trying to read from shard objects that 
haven't ever been written to. Logging them as noisy ERROR messages is 
certainly misleading, but it's probably nothing to worry about.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-17 Thread Dan Jakubiec
Also worth pointing out something a bit obvious but: this kind of 
faster/destructive migration should only be attempted if all your pools are at 
least 3x replicated.

For example, if you had a 1x replicated pool you would lose data using this 
approach.

-- Dan

> On Jan 11, 2018, at 14:24, Reed Dier  wrote:
> 
> Thank you for documenting your progress and peril on the ML.
> 
> Luckily I only have 24x 8TB HDD and 50x 1.92TB SSDs to migrate over to 
> bluestore.
> 
> 8 nodes, 4 chassis (failure domain), 3 drives per node for the HDDs, so I’m 
> able to do about 3 at a time (1 node) for rip/replace.
> 
> Definitely taking it slow and steady, and the SSDs will move quickly for 
> backfills as well.
> Seeing about 1TB/6hr on backfills, without much performance hit on rest of 
> everything, about 5TB average util on each 8TB disk, so just about 30 
> hours-ish per host *8 hosts will be about 10 days, so a couple weeks is a 
> safe amount of headway.
> This write performance certainly seems better on bluestore than filestore, so 
> that likely helps as well.
> 
> Expect I can probably refill an SSD osd in about an hour or two, and will 
> likely stagger those out.
> But with such a small number of osd’s currently, I’m taking the by-hand 
> approach rather than scripting it so as to avoid similar pitfalls.
> 
> Reed 
> 
>> On Jan 11, 2018, at 12:38 PM, Brady Deetz > > wrote:
>> 
>> I hear you on time. I have 350 x 6TB drives to convert. I recently posted 
>> about a disaster I created automating my migration. Good luck
>> 
>> On Jan 11, 2018 12:22 PM, "Reed Dier" > > wrote:
>> I am in the process of migrating my OSDs to bluestore finally and thought I 
>> would give you some input on how I am approaching it.
>> Some of saga you can find in another ML thread here: 
>> https://www.spinics.net/lists/ceph-users/msg41802.html 
>> 
>> 
>> My first OSD I was cautious, and I outed the OSD without downing it, 
>> allowing it to move data off.
>> Some background on my cluster, for this OSD, it is an 8TB spinner, with an 
>> NVMe partition previously used for journaling in filestore, intending to be 
>> used for block.db in bluestore.
>> 
>> Then I downed it, flushed the journal, destroyed it, zapped with 
>> ceph-volume, set norecover and norebalance flags, did ceph osd crush remove 
>> osd.$ID, ceph auth del osd.$ID, and ceph osd rm osd.$ID and used ceph-volume 
>> locally to create the new LVM target. Then unset the norecover and 
>> norebalance flags and it backfilled like normal.
>> 
>> I initially ran into issues with specifying --osd.id  
>> causing my osd’s to fail to start, but removing that I was able to get it to 
>> fill in the gap of the OSD I just removed.
>> 
>> I’m now doing quicker, more destructive migrations in an attempt to reduce 
>> data movement.
>> This way I don’t read from OSD I’m replacing, write to other OSD 
>> temporarily, read back from temp OSD, write back to ‘new’ OSD.
>> I’m just reading from replica and writing to ‘new’ OSD.
>> 
>> So I’m setting the norecover and norebalance flags, down the OSD (but not 
>> out, it stays in, also have the noout flag set), destroy/zap, recreate using 
>> ceph-volume, unset the flags, and it starts backfilling.
>> For 8TB disks, and with 23 other 8TB disks in the pool, it takes a long time 
>> to offload it and then backfill back from them. I trust my disks enough to 
>> backfill from the other disks, and its going well. Also seeing very good 
>> write performance backfilling compared to previous drive replacements in 
>> filestore, so thats very promising.
>> 
>> Reed
>> 
>>> On Jan 10, 2018, at 8:29 AM, Jens-U. Mozdzen >> > wrote:
>>> 
>>> Hi Alfredo,
>>> 
>>> thank you for your comments:
>>> 
>>> Zitat von Alfredo Deza >:
 On Wed, Jan 10, 2018 at 8:57 AM, Jens-U. Mozdzen > wrote:
> Dear *,
> 
> has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
> keeping the OSD number? There have been a number of messages on the list,
> reporting problems, and my experience is the same. (Removing the existing
> OSD and creating a new one does work for me.)
> 
> I'm working on an Ceph 12.2.2 cluster and tried following
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd
>  
> 
> - this basically says
> 
> 1. destroy old OSD
> 2. zap the disk
> 3. prepare the new OSD
> 4. activate the new OSD
> 
> I never got step 4 to complete. The closest I got was by doing the 
> following
> steps (assuming OSD ID "999" on /dev/sdzz):

Re: [ceph-users] After Luminous upgrade: ceph-fuse clients failing to respond to cache pressure

2018-01-17 Thread John Spray
On Wed, Jan 17, 2018 at 3:36 PM, Andras Pataki
 wrote:
> Hi John,
>
> All our hosts are CentOS 7 hosts, the majority are 7.4 with kernel
> 3.10.0-693.5.2.el7.x86_64, with fuse 2.9.2-8.el7.  We have some hosts that
> have slight variations in kernel versions, the oldest one are a handful of
> CentOS 7.3 hosts with kernel 3.10.0-514.21.1.el7.x86_64 and fuse
> 2.9.2-7.el7.  I know Redhat has been backporting lots of stuff so perhaps
> these kernels fall into the category you are describing?

Quite possibly -- this issue was originally noticed on RHEL, so maybe
the relevant bits made it back to CentOS recently.

However, it looks like the fixes for that issue[1,2] are already in
12.2.2, so maybe this is something completely unrelated :-/

The ceph-fuse executable does create an admin command socket in
/var/run/ceph (named something ceph-client...) that you can drive with
"ceph daemon  dump_cache", but the output is extremely verbose
and low level and may not be informative.

John

1. http://tracker.ceph.com/issues/21423
2. http://tracker.ceph.com/issues/22269

>
> When the cache pressure problem happens, is there a way to know exactly
> which hosts are involved, and what items are in their caches easily?
>
> Andras
>
>
>
> On 01/17/2018 06:09 AM, John Spray wrote:
>>
>> On Tue, Jan 16, 2018 at 8:50 PM, Andras Pataki
>>  wrote:
>>>
>>> Dear Cephers,
>>>
>>> We've upgraded the back end of our cluster from Jewel (10.2.10) to
>>> Luminous
>>> (12.2.2).  The upgrade went smoothly for the most part, except we seem to
>>> be
>>> hitting an issue with cephfs.  After about a day or two of use, the MDS
>>> start complaining about clients failing to respond to cache pressure:
>>
>> What's the OS, kernel version and fuse version on the hosts where the
>> clients are running?
>>
>> There have been some issues with ceph-fuse losing the ability to
>> properly invalidate cached items when certain updated OS packages were
>> installed.
>>
>> Specifically, ceph-fuse checks the kernel version against 3.18.0 to
>> decide which invalidation method to use, and if your OS has backported
>> new behaviour to a low-version-numbered kernel, that can confuse it.
>>
>> John
>>
>>> [root@cephmon00 ~]# ceph -s
>>>cluster:
>>>  id: d7b33135-0940-4e48-8aa6-1d2026597c2f
>>>  health: HEALTH_WARN
>>>  1 MDSs have many clients failing to respond to cache
>>> pressure
>>>  noout flag(s) set
>>>  1 osds down
>>>
>>>services:
>>>  mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02
>>>  mgr: cephmon00(active), standbys: cephmon01, cephmon02
>>>  mds: cephfs-1/1/1 up  {0=cephmon00=up:active}, 2 up:standby
>>>  osd: 2208 osds: 2207 up, 2208 in
>>>   flags noout
>>>
>>>data:
>>>  pools:   6 pools, 42496 pgs
>>>  objects: 919M objects, 3062 TB
>>>  usage:   9203 TB used, 4618 TB / 13822 TB avail
>>>  pgs: 42470 active+clean
>>>   22active+clean+scrubbing+deep
>>>   4 active+clean+scrubbing
>>>
>>>io:
>>>  client:   56122 kB/s rd, 18397 kB/s wr, 84 op/s rd, 101 op/s wr
>>>
>>> [root@cephmon00 ~]# ceph health detail
>>> HEALTH_WARN 1 MDSs have many clients failing to respond to cache
>>> pressure;
>>> noout flag(s) set; 1 osds down
>>> MDS_CLIENT_RECALL_MANY 1 MDSs have many clients failing to respond to
>>> cache
>>> pressure
>>>  mdscephmon00(mds.0): Many clients (103) failing to respond to cache
>>> pressureclient_count: 103
>>> OSDMAP_FLAGS noout flag(s) set
>>> OSD_DOWN 1 osds down
>>>  osd.1296 (root=root-disk,pod=pod0-disk,host=cephosd008-disk) is down
>>>
>>>
>>> We are using exclusively the 12.2.2 fuse client on about 350 nodes or so
>>> (out of which it seems 100 are not responding to cache pressure in this
>>> log).  When this happens, clients appear pretty sluggish also (listing
>>> directories, etc.).  After bouncing the MDS, everything returns on normal
>>> after the failover for a while.  Ignore the message about 1 OSD down,
>>> that
>>> corresponds to a failed drive and all data has been re-replicated since.
>>>
>>> We were also using the 12.2.2 fuse client with the Jewel back end before
>>> the
>>> upgrade, and have not seen this issue.
>>>
>>> We are running with a larger MDS cache than usual, we have mds_cache_size
>>> set to 4 million.  All other MDS configs are the defaults.
>>>
>>> Is this a known issue?  If not, any hints on how to further diagnose the
>>> problem?
>>>
>>> Andras
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to update old pre ceph-deploy osds to current systemd way?

2018-01-17 Thread Smith, Eric
We had to update the OS / kernel, chown all the data to ceph:ceph, and update 
the partition type codes on both the OSDs and journals. After this udev and 
systemd brought them up automatically.



From: ceph-users  on behalf of Stefan Priebe 
- Profihost AG 
Sent: Wednesday, January 17, 2018 10:45 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] how to update old pre ceph-deploy osds to current systemd 
way?

Hello,

i've some osds which were created under bobtail or argonaut (pre
ceph-deploy).

Those are not recognized as a ceph-osd@57.service . Also they have an
entry in the ceph.conf:

[osd.12]
host=1336
osd_data = /ceph/osd.$id/
osd_journal = /dev/disk/by-partlabel/journal$id

Is there any way to migrate them?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ceph-users Info Page
lists.ceph.com
To see the collection of prior postings to the list, visit the ceph-users 
Archives. Using ceph-users: To post a message to all the list members, send ...



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to update old pre ceph-deploy osds to current systemd way?

2018-01-17 Thread Stefan Priebe - Profihost AG
Hello,

i've some osds which were created under bobtail or argonaut (pre
ceph-deploy).

Those are not recognized as a ceph-osd@57.service . Also they have an
entry in the ceph.conf:

[osd.12]
host=1336
osd_data = /ceph/osd.$id/
osd_journal = /dev/disk/by-partlabel/journal$id

Is there any way to migrate them?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-17 Thread Jens-U. Mozdzen

Zitat von Jens-U. Mozdzen

Hi Alfredo,

thank you for your comments:

Zitat von Alfredo Deza :

On Wed, Jan 10, 2018 at 8:57 AM, Jens-U. Mozdzen  wrote:

Dear *,

has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
keeping the OSD number? There have been a number of messages on the list,
reporting problems, and my experience is the same. (Removing the existing
OSD and creating a new one does work for me.)

I'm working on an Ceph 12.2.2 cluster and tried following
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd
- this basically says

1. destroy old OSD
2. zap the disk
3. prepare the new OSD
4. activate the new OSD

I never got step 4 to complete. The closest I got was by doing the  
following

steps (assuming OSD ID "999" on /dev/sdzz):

1. Stop the old OSD via systemd (osd-node # systemctl stop
ceph-osd@999.service)

2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)

3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's
volume group

3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)

4. destroy the old OSD (osd-node # ceph osd destroy 999
--yes-i-really-mean-it)

5. create a new OSD entry (osd-node # ceph osd new $(cat
/var/lib/ceph/osd/ceph-999/fsid) 999)


Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.



6. add the OSD secret to Ceph authentication (osd-node # ceph auth add
osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' -i
/var/lib/ceph/osd/ceph-999/keyring)


I at first tried to follow the documented steps (without my steps 5
and 6), which did not work for me. The documented approach failed with
"init authentication >> failed: (1) Operation not permitted", because
actually ceph-volume did not add the auth entry for me.

But even after manually adding the authentication, the "ceph-volume"
approach failed, as the OSD was still marked "destroyed" in the osdmap
epoch as used by ceph-osd (see the commented messages from
ceph-osd.999.log below).



7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
--osd-id 999 --data /dev/sdzz)


You are going to hit a bug in ceph-volume that is preventing you from
specifying the osd id directly if the ID has been destroyed.

See http://tracker.ceph.com/issues/22642


If I read that bug description correctly, you're confirming why I
needed step #6 above (manually adding the OSD auth entry. But even if
ceph-volume had added it, the ceph-osd.log entries suggest that
starting the OSD would still have failed, because of accessing the
wrong osdmap epoch.

To me it seems like I'm hitting a bug outside of ceph-volume [...]


just for the records (and search engines), this was confirmed to be a  
bug, see http://tracker.ceph.com/issues/22673


Regards,
Jens

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After Luminous upgrade: ceph-fuse clients failing to respond to cache pressure

2018-01-17 Thread Andras Pataki

Hi John,

All our hosts are CentOS 7 hosts, the majority are 7.4 with kernel 
3.10.0-693.5.2.el7.x86_64, with fuse 2.9.2-8.el7.  We have some hosts 
that have slight variations in kernel versions, the oldest one are a 
handful of CentOS 7.3 hosts with kernel 3.10.0-514.21.1.el7.x86_64 and 
fuse 2.9.2-7.el7.  I know Redhat has been backporting lots of stuff so 
perhaps these kernels fall into the category you are describing?


When the cache pressure problem happens, is there a way to know exactly 
which hosts are involved, and what items are in their caches easily?


Andras


On 01/17/2018 06:09 AM, John Spray wrote:

On Tue, Jan 16, 2018 at 8:50 PM, Andras Pataki
 wrote:

Dear Cephers,

We've upgraded the back end of our cluster from Jewel (10.2.10) to Luminous
(12.2.2).  The upgrade went smoothly for the most part, except we seem to be
hitting an issue with cephfs.  After about a day or two of use, the MDS
start complaining about clients failing to respond to cache pressure:

What's the OS, kernel version and fuse version on the hosts where the
clients are running?

There have been some issues with ceph-fuse losing the ability to
properly invalidate cached items when certain updated OS packages were
installed.

Specifically, ceph-fuse checks the kernel version against 3.18.0 to
decide which invalidation method to use, and if your OS has backported
new behaviour to a low-version-numbered kernel, that can confuse it.

John


[root@cephmon00 ~]# ceph -s
   cluster:
 id: d7b33135-0940-4e48-8aa6-1d2026597c2f
 health: HEALTH_WARN
 1 MDSs have many clients failing to respond to cache pressure
 noout flag(s) set
 1 osds down

   services:
 mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02
 mgr: cephmon00(active), standbys: cephmon01, cephmon02
 mds: cephfs-1/1/1 up  {0=cephmon00=up:active}, 2 up:standby
 osd: 2208 osds: 2207 up, 2208 in
  flags noout

   data:
 pools:   6 pools, 42496 pgs
 objects: 919M objects, 3062 TB
 usage:   9203 TB used, 4618 TB / 13822 TB avail
 pgs: 42470 active+clean
  22active+clean+scrubbing+deep
  4 active+clean+scrubbing

   io:
 client:   56122 kB/s rd, 18397 kB/s wr, 84 op/s rd, 101 op/s wr

[root@cephmon00 ~]# ceph health detail
HEALTH_WARN 1 MDSs have many clients failing to respond to cache pressure;
noout flag(s) set; 1 osds down
MDS_CLIENT_RECALL_MANY 1 MDSs have many clients failing to respond to cache
pressure
 mdscephmon00(mds.0): Many clients (103) failing to respond to cache
pressureclient_count: 103
OSDMAP_FLAGS noout flag(s) set
OSD_DOWN 1 osds down
 osd.1296 (root=root-disk,pod=pod0-disk,host=cephosd008-disk) is down


We are using exclusively the 12.2.2 fuse client on about 350 nodes or so
(out of which it seems 100 are not responding to cache pressure in this
log).  When this happens, clients appear pretty sluggish also (listing
directories, etc.).  After bouncing the MDS, everything returns on normal
after the failover for a while.  Ignore the message about 1 OSD down, that
corresponds to a failed drive and all data has been re-replicated since.

We were also using the 12.2.2 fuse client with the Jewel back end before the
upgrade, and have not seen this issue.

We are running with a larger MDS cache than usual, we have mds_cache_size
set to 4 million.  All other MDS configs are the defaults.

Is this a known issue?  If not, any hints on how to further diagnose the
problem?

Andras


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph luminous - DELL R620 - performance expectations

2018-01-17 Thread Steven Vacaroaia
Hi,

I have been trying to asses performance on a 3 servers cluster for few days
now

The most I got you can see below ( 244 MBs for 3 OSDs and 15GB SSD
partition)
With 3 servers wouldn't make sense to get almost 3 times the speed of the
hard drive ?


Questions:
- what should be the performance I am aiming for ?
- how should I configure SSD cache/controller ?
- how should I configure Harddisk cache/controllers ?
- any other performance tweaks except a tuned sysctl.conf ?
- how does it scale ( i.e 3 servers 600 MBs, 4 servers 800 MBs assuming one
hard drive per server ...)
 - is there a formula / rule of thumb approach to estimate performance
  ( e.g, if I want to create a SSD only pool with one drive per server ,
what should I expect ??)


MANY THANKS !!!

Configuration
  kernel  3.10.0-693.11.6.el7.x86_64
  bonded 10GB , ixgbe 5.4
  PERC H710 Mini
 /dev/sda is the SSD ( Toshiba 400 GB PX04SHB040 )
/dev/sd[b-f] are 10K Entreprise ( Toshiba 600GB AL13SEB600 )

  [root@osd01 ~]#  megacli -LDGetProp  -Cache -LALL -a0

Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone,
Direct, No Write Cache if bad BBU
Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU
Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU
Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU
Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU
Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU

Exit Code: 0x00
[root@osd01 ~]#  megacli -LDGetProp  -DskCache -LALL -a0

Adapter 0-VD 0(target id: 0): Disk Write Cache : Disabled
Adapter 0-VD 1(target id: 1): Disk Write Cache : Disk's Default
Adapter 0-VD 2(target id: 2): Disk Write Cache : Disk's Default
Adapter 0-VD 3(target id: 3): Disk Write Cache : Disk's Default
Adapter 0-VD 4(target id: 4): Disk Write Cache : Disk's Default
Adapter 0-VD 5(target id: 5): Disk Write Cache : Disk's Default

 hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   20628 MB in  1.99 seconds = 10355.20 MB/sec
 Timing buffered disk reads: 1610 MB in  3.00 seconds = 536.23 MB/sec
[root@osd01 ~]# hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   19940 MB in  1.99 seconds = 10009.61 MB/sec
 Timing buffered disk reads: 602 MB in  3.01 seconds = 200.27 MB/sec



 /sys/block/sd* settings
   read_ahead_kb = 4096
   scheduler = deadline


 egrep -v "^#|^$" /etc/sysctl.conf
net.ipv4.tcp_sack = 0
net.core.netdev_budget = 600
net.ipv4.tcp_window_scaling = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_syncookies = 0
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2
net.ipv4.tcp_max_syn_backlog = 3
net.ipv4.tcp_max_tw_buckets = 200
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
vm.min_free_kbytes = 262144
vm.swappiness = 0
vm.vfs_cache_pressure = 100
fs.suid_dumpable = 0
kernel.core_uses_pid = 1
kernel.msgmax = 65536
kernel.msgmnb = 65536
kernel.randomize_va_space = 1
kernel.sysrq = 0
kernel.pid_max = 4194304
fs.file-max = 10


rados bench -p scbench 300 write --no-cleanup && rados bench -p scbench 300
seq



Total time run: 311.478719

Total writes made:  8983

Write size: 4194304

Object size:4194304

Bandwidth (MB/sec): 115.359

Stddev Bandwidth:   67.5735

Max bandwidth (MB/sec): 244

Min bandwidth (MB/sec): 0

Average IOPS:   28

Stddev IOPS:16

Max IOPS:   61

Min IOPS:   0

Average Latency(s): 0.554779

Stddev Latency(s):  1.57807

Max latency(s): 21.0212

Min latency(s): 0.00805304





Total time run:   303.082321

Total reads made: 2558

Read size:4194304

Object size:  4194304

Bandwidth (MB/sec):   33.7598

Average IOPS: 8

Stddev IOPS:  9

Max IOPS: 38

Min IOPS: 0

Average Latency(s):   1.89518

Max latency(s):   52.1244

Min latency(s):   0.0191481
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bug in RadosGW resharding? Hangs again...

2018-01-17 Thread Orit Wasserman
On Wed, Jan 17, 2018 at 3:16 PM, Martin Emrich 
wrote:

> Hi!
>
>
>
> I created a tracker ticket: http://tracker.ceph.com/issues/22721
>
> It also happens without a lifecycle rule (only versioning).
>
>
>
Thanks.


> I collected a log from the resharding process, after 10 minutes I canceled
> it. Got 500MB log (gzipped still 20MB), so I cannot upload it to the bug
> tracker.
>
>
>
I will try to reproduce it on my setup it should be simpler now I am sure
it is the versioning.

Orit


> Regards,
>
>
>
> Martin
>
>
>
>
>
> *Von: *Orit Wasserman 
> *Datum: *Mittwoch, 17. Januar 2018 um 11:57
>
> *An: *Martin Emrich 
> *Cc: *ceph-users 
> *Betreff: *Re: [ceph-users] Bug in RadosGW resharding? Hangs again...
>
>
>
>
>
>
>
> On Wed, Jan 17, 2018 at 11:45 AM, Martin Emrich 
> wrote:
>
> Hi Orit!
>
>
>
> I did some tests, and indeed the combination of Versioning/Lifecycle with
> Resharding is the problem:
>
>
>
>- If I do not enable Versioning/Lifecycle, Autoresharding works fine.
>- If I disable Autoresharding but enable Versioning+Lifecycle, pushing
>data works fine, until I manually reshard. This hangs also.
>
>
>
> Thanks for testing :) This is very helpful!
>
>
>
> My lifecycle rule (which shall remove all versions older than 60 days):
>
>
>
> {
>
> "Rules": [{
>
> "Status": "Enabled",
>
> "Prefix": "",
>
> "NoncurrentVersionExpiration": {
>
> "NoncurrentDays": 60
>
> },
>
> "Expiration": {
>
> "ExpiredObjectDeleteMarker": true
>
> },
>
> "ID": "expire-60days"
>
> }]
>
> }
>
>
>
> I am currently testing with an application containing customer data, but I
> am also creating some random test data to create logs I can share.
>
> I will also test whether the versioning itself is the culprit, or if it is
> the lifecycle rule.
>
>
>
>
>
> I am suspecting versioning (never tried it with resharding).
>
> Can you open a tracker issue with all the information?
>
>
>
> Thanks,
>
> Orit
>
>
>
> Regards,
>
> Martin
>
>
>
> *Von: *Orit Wasserman 
> *Datum: *Dienstag, 16. Januar 2018 um 18:38
> *An: *Martin Emrich 
> *Cc: *ceph-users 
> *Betreff: *Re: [ceph-users] Bug in RadosGW resharding? Hangs again...
>
>
>
> Hi Martin,
>
>
>
> On Mon, Jan 15, 2018 at 6:04 PM, Martin Emrich 
> wrote:
>
> Hi!
>
> After having a completely broken radosgw setup due to damaged buckets, I
> completely deleted all rgw pools, and started from scratch.
>
> But my problem is reproducible. After pushing ca. 10 objects into a
> bucket, the resharding process appears to start, and the bucket is now
> unresponsive.
>
>
>
> Sorry to hear that.
>
> Can you share radosgw logs with --debug_rgw=20 --debug_ms=1?
>
>
>
> I just see lots of these messages in all rgw logs:
>
> 2018-01-15 16:57:45.108826 7fd1779b1700  0 block_while_resharding ERROR:
> bucket is still resharding, please retry
> 2018-01-15 16:57:45.119184 7fd1779b1700  0 NOTICE: resharding operation on
> bucket index detected, blocking
> 2018-01-15 16:57:45.260751 7fd1120e6700  0 block_while_resharding ERROR:
> bucket is still resharding, please retry
> 2018-01-15 16:57:45.280410 7fd1120e6700  0 NOTICE: resharding operation on
> bucket index detected, blocking
> 2018-01-15 16:57:45.300775 7fd15b979700  0 block_while_resharding ERROR:
> bucket is still resharding, please retry
> 2018-01-15 16:57:45.300971 7fd15b979700  0 WARNING: set_req_state_err
> err_no=2300 resorting to 500
> 2018-01-15 16:57:45.301042 7fd15b979700  0 ERROR: 
> RESTFUL_IO(s)->complete_header()
> returned err=Input/output error
>
> One radosgw process and two OSDs housing the bucket index/metadata are
> still busy, but it seems to be stuck again.
>
> How long is this resharding process supposed to take? I cannot believe
> that an application is supposed to block for more than half an hour...
>
> I feel inclined to open a bug report, but I am yet unshure where the
> problem lies.
>
> Some information:
>
> * 3 RGW processes, 3 OSD hosts with 12 HDD OSDs and 6 SSD OSDs
> * Ceph 12.2.2
> * Auto-Resharding on, Bucket Versioning & Lifecycle rule enabled.
>
>
>
> What life cycle rules do you use?
>
>
>
> Regards,
>
> Orit
>
> Thanks,
>
> Martin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bug in RadosGW resharding? Hangs again...

2018-01-17 Thread Martin Emrich
Hi!

I created a tracker ticket: http://tracker.ceph.com/issues/22721
It also happens without a lifecycle rule (only versioning).

I collected a log from the resharding process, after 10 minutes I canceled it. 
Got 500MB log (gzipped still 20MB), so I cannot upload it to the bug tracker.

Regards,

Martin


Von: Orit Wasserman 
Datum: Mittwoch, 17. Januar 2018 um 11:57
An: Martin Emrich 
Cc: ceph-users 
Betreff: Re: [ceph-users] Bug in RadosGW resharding? Hangs again...



On Wed, Jan 17, 2018 at 11:45 AM, Martin Emrich 
> wrote:
Hi Orit!

I did some tests, and indeed the combination of Versioning/Lifecycle with 
Resharding is the problem:


  *   If I do not enable Versioning/Lifecycle, Autoresharding works fine.
  *   If I disable Autoresharding but enable Versioning+Lifecycle, pushing data 
works fine, until I manually reshard. This hangs also.

Thanks for testing :) This is very helpful!

My lifecycle rule (which shall remove all versions older than 60 days):


{

"Rules": [{

"Status": "Enabled",

"Prefix": "",

"NoncurrentVersionExpiration": {

"NoncurrentDays": 60

},

"Expiration": {

"ExpiredObjectDeleteMarker": true

},

"ID": "expire-60days"

}]

}

I am currently testing with an application containing customer data, but I am 
also creating some random test data to create logs I can share.
I will also test whether the versioning itself is the culprit, or if it is the 
lifecycle rule.


I am suspecting versioning (never tried it with resharding).
Can you open a tracker issue with all the information?

Thanks,
Orit

Regards,
Martin

Von: Orit Wasserman >
Datum: Dienstag, 16. Januar 2018 um 18:38
An: Martin Emrich >
Cc: ceph-users >
Betreff: Re: [ceph-users] Bug in RadosGW resharding? Hangs again...

Hi Martin,

On Mon, Jan 15, 2018 at 6:04 PM, Martin Emrich 
> wrote:
Hi!

After having a completely broken radosgw setup due to damaged buckets, I 
completely deleted all rgw pools, and started from scratch.

But my problem is reproducible. After pushing ca. 10 objects into a bucket, 
the resharding process appears to start, and the bucket is now unresponsive.

Sorry to hear that.
Can you share radosgw logs with --debug_rgw=20 --debug_ms=1?

I just see lots of these messages in all rgw logs:

2018-01-15 16:57:45.108826 7fd1779b1700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2018-01-15 16:57:45.119184 7fd1779b1700  0 NOTICE: resharding operation on 
bucket index detected, blocking
2018-01-15 16:57:45.260751 7fd1120e6700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2018-01-15 16:57:45.280410 7fd1120e6700  0 NOTICE: resharding operation on 
bucket index detected, blocking
2018-01-15 16:57:45.300775 7fd15b979700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2018-01-15 16:57:45.300971 7fd15b979700  0 WARNING: set_req_state_err 
err_no=2300 resorting to 500
2018-01-15 16:57:45.301042 7fd15b979700  0 ERROR: 
RESTFUL_IO(s)->complete_header() returned err=Input/output error

One radosgw process and two OSDs housing the bucket index/metadata are still 
busy, but it seems to be stuck again.

How long is this resharding process supposed to take? I cannot believe that an 
application is supposed to block for more than half an hour...

I feel inclined to open a bug report, but I am yet unshure where the problem 
lies.

Some information:

* 3 RGW processes, 3 OSD hosts with 12 HDD OSDs and 6 SSD OSDs
* Ceph 12.2.2
* Auto-Resharding on, Bucket Versioning & Lifecycle rule enabled.

What life cycle rules do you use?

Regards,
Orit
Thanks,

Martin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS injectargs

2018-01-17 Thread Eugen Block

Ah, I think I got it:

ceph@host1:~> ceph tell mds.* config set mds_cache_memory_limit 204800
[...]
mds.host2: Set mds_cache_memory_limit to 204800
[...]
mds.host1: Set mds_cache_memory_limit to 204800

:-)


Zitat von Florent B :


Of course, specifying mds.NAME is working (without *), but there's no
way to do it on all MDS at the same time ?


On 17/01/2018 12:54, Florent B wrote:

That's what I did, I run it on the active MDS server.

I run 12.2.2 version.


On 17/01/2018 12:53, Eugen Block wrote:

Can you try it on one of your MDS servers? It should work there.

Zitat von Florent B :


Hi,

Thank you but I got :

    admin_socket: exception getting command descriptions: [Errno 2] No
such file or directory


On 17/01/2018 12:47, Eugen Block wrote:

Hi,

try it with

ceph daemon mds.* config set mds_cache_size 0

Regards,
Eugen


Zitat von Florent B :


Hi,

I would like to reset "mds_cache_size" to its default value (0 in
Luminous).

So I do :

    ceph tell mds.* injectargs '--mds_cache_size 0'

I also tried :

    ceph tell mds.* injectargs '--mds_cache_size = 0'

    ceph tell mds.* injectargs '--mds_cache_size=0'

But I always have errors like these for each MDS :

    2018-01-17 12:39:06.134654 7fdc557fa700  0 client.475461118
ms_handle_reset on 10.111.0.3:6800/2482597610
    Error EPERM: problem getting command descriptions from mds.host3
    mds.host3: problem getting command descriptions from mds.host3

My client.admin keyring is :

    client.admin
        key: X
        auid: 0
        caps: [mds] allow
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *

How can I change this value without restarting services ?

Thank you.

Florent

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS injectargs

2018-01-17 Thread Eugen Block
I don't know one yet, it would be helpful though. Maybe someone of the  
more experienced users can enlighten us. :-)



Zitat von Florent B :


Of course, specifying mds.NAME is working (without *), but there's no
way to do it on all MDS at the same time ?


On 17/01/2018 12:54, Florent B wrote:

That's what I did, I run it on the active MDS server.

I run 12.2.2 version.


On 17/01/2018 12:53, Eugen Block wrote:

Can you try it on one of your MDS servers? It should work there.

Zitat von Florent B :


Hi,

Thank you but I got :

    admin_socket: exception getting command descriptions: [Errno 2] No
such file or directory


On 17/01/2018 12:47, Eugen Block wrote:

Hi,

try it with

ceph daemon mds.* config set mds_cache_size 0

Regards,
Eugen


Zitat von Florent B :


Hi,

I would like to reset "mds_cache_size" to its default value (0 in
Luminous).

So I do :

    ceph tell mds.* injectargs '--mds_cache_size 0'

I also tried :

    ceph tell mds.* injectargs '--mds_cache_size = 0'

    ceph tell mds.* injectargs '--mds_cache_size=0'

But I always have errors like these for each MDS :

    2018-01-17 12:39:06.134654 7fdc557fa700  0 client.475461118
ms_handle_reset on 10.111.0.3:6800/2482597610
    Error EPERM: problem getting command descriptions from mds.host3
    mds.host3: problem getting command descriptions from mds.host3

My client.admin keyring is :

    client.admin
        key: X
        auid: 0
        caps: [mds] allow
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *

How can I change this value without restarting services ?

Thank you.

Florent

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS injectargs

2018-01-17 Thread Eugen Block

Oh, okay... can you run it on a specific host instead of "mds.*"?

I only get the error if I try to run the command on host 1 for host 2:

ceph@host1:~> ceph daemon mds.host2 config show
admin_socket: exception getting command descriptions: [Errno 2] No  
such file or directory

ceph@host1:~> ceph daemon mds.host1 config show | grep mds_cache_size
"mds_cache_size": "0",



Zitat von Florent B :


That's what I did, I run it on the active MDS server.

I run 12.2.2 version.


On 17/01/2018 12:53, Eugen Block wrote:

Can you try it on one of your MDS servers? It should work there.

Zitat von Florent B :


Hi,

Thank you but I got :

    admin_socket: exception getting command descriptions: [Errno 2] No
such file or directory


On 17/01/2018 12:47, Eugen Block wrote:

Hi,

try it with

ceph daemon mds.* config set mds_cache_size 0

Regards,
Eugen


Zitat von Florent B :


Hi,

I would like to reset "mds_cache_size" to its default value (0 in
Luminous).

So I do :

    ceph tell mds.* injectargs '--mds_cache_size 0'

I also tried :

    ceph tell mds.* injectargs '--mds_cache_size = 0'

    ceph tell mds.* injectargs '--mds_cache_size=0'

But I always have errors like these for each MDS :

    2018-01-17 12:39:06.134654 7fdc557fa700  0 client.475461118
ms_handle_reset on 10.111.0.3:6800/2482597610
    Error EPERM: problem getting command descriptions from mds.host3
    mds.host3: problem getting command descriptions from mds.host3

My client.admin keyring is :

    client.admin
        key: X
        auid: 0
        caps: [mds] allow
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *

How can I change this value without restarting services ?

Thank you.

Florent

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS injectargs

2018-01-17 Thread Eugen Block

Can you try it on one of your MDS servers? It should work there.

Zitat von Florent B :


Hi,

Thank you but I got :

    admin_socket: exception getting command descriptions: [Errno 2] No
such file or directory


On 17/01/2018 12:47, Eugen Block wrote:

Hi,

try it with

ceph daemon mds.* config set mds_cache_size 0

Regards,
Eugen


Zitat von Florent B :


Hi,

I would like to reset "mds_cache_size" to its default value (0 in
Luminous).

So I do :

    ceph tell mds.* injectargs '--mds_cache_size 0'

I also tried :

    ceph tell mds.* injectargs '--mds_cache_size = 0'

    ceph tell mds.* injectargs '--mds_cache_size=0'

But I always have errors like these for each MDS :

    2018-01-17 12:39:06.134654 7fdc557fa700  0 client.475461118
ms_handle_reset on 10.111.0.3:6800/2482597610
    Error EPERM: problem getting command descriptions from mds.host3
    mds.host3: problem getting command descriptions from mds.host3

My client.admin keyring is :

    client.admin
        key: X
        auid: 0
        caps: [mds] allow
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *

How can I change this value without restarting services ?

Thank you.

Florent

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS injectargs

2018-01-17 Thread Eugen Block

Hi,

try it with

ceph daemon mds.* config set mds_cache_size 0

Regards,
Eugen


Zitat von Florent B :


Hi,

I would like to reset "mds_cache_size" to its default value (0 in Luminous).

So I do :

    ceph tell mds.* injectargs '--mds_cache_size 0'

I also tried :

    ceph tell mds.* injectargs '--mds_cache_size = 0'

    ceph tell mds.* injectargs '--mds_cache_size=0'

But I always have errors like these for each MDS :

    2018-01-17 12:39:06.134654 7fdc557fa700  0 client.475461118
ms_handle_reset on 10.111.0.3:6800/2482597610
    Error EPERM: problem getting command descriptions from mds.host3
    mds.host3: problem getting command descriptions from mds.host3

My client.admin keyring is :

    client.admin
        key: X
        auid: 0
        caps: [mds] allow
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *

How can I change this value without restarting services ?

Thank you.

Florent

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After Luminous upgrade: ceph-fuse clients failing to respond to cache pressure

2018-01-17 Thread John Spray
On Tue, Jan 16, 2018 at 8:50 PM, Andras Pataki
 wrote:
> Dear Cephers,
>
> We've upgraded the back end of our cluster from Jewel (10.2.10) to Luminous
> (12.2.2).  The upgrade went smoothly for the most part, except we seem to be
> hitting an issue with cephfs.  After about a day or two of use, the MDS
> start complaining about clients failing to respond to cache pressure:

What's the OS, kernel version and fuse version on the hosts where the
clients are running?

There have been some issues with ceph-fuse losing the ability to
properly invalidate cached items when certain updated OS packages were
installed.

Specifically, ceph-fuse checks the kernel version against 3.18.0 to
decide which invalidation method to use, and if your OS has backported
new behaviour to a low-version-numbered kernel, that can confuse it.

John

>
> [root@cephmon00 ~]# ceph -s
>   cluster:
> id: d7b33135-0940-4e48-8aa6-1d2026597c2f
> health: HEALTH_WARN
> 1 MDSs have many clients failing to respond to cache pressure
> noout flag(s) set
> 1 osds down
>
>   services:
> mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02
> mgr: cephmon00(active), standbys: cephmon01, cephmon02
> mds: cephfs-1/1/1 up  {0=cephmon00=up:active}, 2 up:standby
> osd: 2208 osds: 2207 up, 2208 in
>  flags noout
>
>   data:
> pools:   6 pools, 42496 pgs
> objects: 919M objects, 3062 TB
> usage:   9203 TB used, 4618 TB / 13822 TB avail
> pgs: 42470 active+clean
>  22active+clean+scrubbing+deep
>  4 active+clean+scrubbing
>
>   io:
> client:   56122 kB/s rd, 18397 kB/s wr, 84 op/s rd, 101 op/s wr
>
> [root@cephmon00 ~]# ceph health detail
> HEALTH_WARN 1 MDSs have many clients failing to respond to cache pressure;
> noout flag(s) set; 1 osds down
> MDS_CLIENT_RECALL_MANY 1 MDSs have many clients failing to respond to cache
> pressure
> mdscephmon00(mds.0): Many clients (103) failing to respond to cache
> pressureclient_count: 103
> OSDMAP_FLAGS noout flag(s) set
> OSD_DOWN 1 osds down
> osd.1296 (root=root-disk,pod=pod0-disk,host=cephosd008-disk) is down
>
>
> We are using exclusively the 12.2.2 fuse client on about 350 nodes or so
> (out of which it seems 100 are not responding to cache pressure in this
> log).  When this happens, clients appear pretty sluggish also (listing
> directories, etc.).  After bouncing the MDS, everything returns on normal
> after the failover for a while.  Ignore the message about 1 OSD down, that
> corresponds to a failed drive and all data has been re-replicated since.
>
> We were also using the 12.2.2 fuse client with the Jewel back end before the
> upgrade, and have not seen this issue.
>
> We are running with a larger MDS cache than usual, we have mds_cache_size
> set to 4 million.  All other MDS configs are the defaults.
>
> Is this a known issue?  If not, any hints on how to further diagnose the
> problem?
>
> Andras
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bug in RadosGW resharding? Hangs again...

2018-01-17 Thread Orit Wasserman
On Wed, Jan 17, 2018 at 11:45 AM, Martin Emrich 
wrote:

> Hi Orit!
>
>
>
> I did some tests, and indeed the combination of Versioning/Lifecycle with
> Resharding is the problem:
>
>
>
>- If I do not enable Versioning/Lifecycle, Autoresharding works fine.
>- If I disable Autoresharding but enable Versioning+Lifecycle, pushing
>data works fine, until I manually reshard. This hangs also.
>
>
>
Thanks for testing :) This is very helpful!

My lifecycle rule (which shall remove all versions older than 60 days):
>
>
>
> {
>
> "Rules": [{
>
> "Status": "Enabled",
>
> "Prefix": "",
>
> "NoncurrentVersionExpiration": {
>
> "NoncurrentDays": 60
>
> },
>
> "Expiration": {
>
> "ExpiredObjectDeleteMarker": true
>
> },
>
> "ID": "expire-60days"
>
> }]
>
> }
>
>
>
> I am currently testing with an application containing customer data, but I
> am also creating some random test data to create logs I can share.
>
> I will also test whether the versioning itself is the culprit, or if it is
> the lifecycle rule.
>
>
>

I am suspecting versioning (never tried it with resharding).
Can you open a tracker issue with all the information?

Thanks,
Orit

Regards,
>
> Martin
>
>
>
> *Von: *Orit Wasserman 
> *Datum: *Dienstag, 16. Januar 2018 um 18:38
> *An: *Martin Emrich 
> *Cc: *ceph-users 
> *Betreff: *Re: [ceph-users] Bug in RadosGW resharding? Hangs again...
>
>
>
> Hi Martin,
>
>
>
> On Mon, Jan 15, 2018 at 6:04 PM, Martin Emrich 
> wrote:
>
> Hi!
>
> After having a completely broken radosgw setup due to damaged buckets, I
> completely deleted all rgw pools, and started from scratch.
>
> But my problem is reproducible. After pushing ca. 10 objects into a
> bucket, the resharding process appears to start, and the bucket is now
> unresponsive.
>
>
>
> Sorry to hear that.
>
> Can you share radosgw logs with --debug_rgw=20 --debug_ms=1?
>
>
>
> I just see lots of these messages in all rgw logs:
>
> 2018-01-15 16:57:45.108826 7fd1779b1700  0 block_while_resharding ERROR:
> bucket is still resharding, please retry
> 2018-01-15 16:57:45.119184 7fd1779b1700  0 NOTICE: resharding operation on
> bucket index detected, blocking
> 2018-01-15 16:57:45.260751 7fd1120e6700  0 block_while_resharding ERROR:
> bucket is still resharding, please retry
> 2018-01-15 16:57:45.280410 7fd1120e6700  0 NOTICE: resharding operation on
> bucket index detected, blocking
> 2018-01-15 16:57:45.300775 7fd15b979700  0 block_while_resharding ERROR:
> bucket is still resharding, please retry
> 2018-01-15 16:57:45.300971 7fd15b979700  0 WARNING: set_req_state_err
> err_no=2300 resorting to 500
> 2018-01-15 16:57:45.301042 7fd15b979700  0 ERROR: 
> RESTFUL_IO(s)->complete_header()
> returned err=Input/output error
>
> One radosgw process and two OSDs housing the bucket index/metadata are
> still busy, but it seems to be stuck again.
>
> How long is this resharding process supposed to take? I cannot believe
> that an application is supposed to block for more than half an hour...
>
> I feel inclined to open a bug report, but I am yet unshure where the
> problem lies.
>
> Some information:
>
> * 3 RGW processes, 3 OSD hosts with 12 HDD OSDs and 6 SSD OSDs
> * Ceph 12.2.2
> * Auto-Resharding on, Bucket Versioning & Lifecycle rule enabled.
>
>
>
> What life cycle rules do you use?
>
>
>
> Regards,
>
> Orit
>
> Thanks,
>
> Martin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After Luminous upgrade: ceph-fuse clients failingtorespond to cache pressure

2018-01-17 Thread Andras Pataki

Burkhard,

Thanks very much for info - I'll try the MDS with a 16GB 
mds_cache_memory_limit (which leaves some buffer for extra memory 
consumption on the machine), and report back if there are any issues 
remaining.


Andras


On 01/17/2018 02:40 AM, Burkhard Linke wrote:

Hi,


On 01/16/2018 09:50 PM, Andras Pataki wrote:

Dear Cephers,


*snipsnap*




We are running with a larger MDS cache than usual, we have 
mds_cache_size set to 4 million.  All other MDS configs are the 
defaults.


AFAIK the MDS cache management in luminous has changed, focusing on 
memory size instead of number of inodes/caps/whatever.


We had to replace mds_cache_size with mds_cache_memory_limit to get 
mds cache working as expected again. This may also be the cause for 
the issue, since the default configuration uses quite a small cache. 
You can check this with 'ceph daemonperf mds.XYZ' on the mds host.


If you change the memory limit you also need to consider a certain 
overhead of the memory allocation. There was a thread about this on 
the mailing list some weeks ago; you should expect at least 50% 
overhead. As with the previous releases this is not a hard limit. The 
process may consume more memory in certain situations. Given the fact 
that bluestore osds do not use kernel page cache anymore but their own 
memory cache, you need to plan memory consumption of all ceph daemons.


As an example, our mds is configured with mds_cache_memory_limit = 
80 and is consuming about 12 GB memory RSS.


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Re Two datacenter resilient design with a quorum site

2018-01-17 Thread Vincent Godin
Hello Alex,

We have a similar design. Two Datacenters at short distance (sharing
the same level 2 network) and one Datacenter at long range (more than
100km) for our Ceph cluster. Let's call these sites A1, A2 and B.

We set 2 Mons on A1, 2 Mons on A2 and 1 Mon on B. A1 and A2 shared a
same level 2 network. We need routing to connect to B.

We set a HSRP Gateway on A1 & A2 to reach the B site. Let's call them
GwA1 and GwA2 with default to GwA1

We set a HSRP Gateway on site B. Let's call them GwB1 and GwB2 with
default to GwB1. GwB1 connected to A1 and A2 via GwA1, GwB2 connected
to A1 and A2 via GwA2. We set an simple LACP between GwB1 and GwA1
ports and an other between GwB2 and GwA2 ports. (If GwA1 port is going
down then GwB1 port will go down too)

So if everything is OK, the Mon on site B can see every OSDs and Mons
on both sites A1 & A2 via GwB1, then GwA1. Quorum is reached and Ceph
is healthy

if B1 site is down, the Mon on site B can see every OSDs and Mons on
site A1 via GwB1, then GwA1. Quorum is reached and Ceph is available

If A1 site is down, both HSRPs will change. The Mon on site B will see
Mons and OSDs of the A2 site via GwB2 then GwA2. Quorum is reached and
Ceph is still available

if the L2 links between A1 & B2 are cut, the B2 site will be isolated.
The Mon on site B can see every OSDs and Mons on A1 via GwB1, then
GwA1 but cannot see Mons and OSDs of the A2 site because of the link
failure. The quorum will be reached only on A1 site with 3 Mons and
Ceph will still be available

I hope i have been enough clear. Tell me
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bug in RadosGW resharding? Hangs again...

2018-01-17 Thread Martin Emrich
Hi Orit!

I did some tests, and indeed the combination of Versioning/Lifecycle with 
Resharding is the problem:


  *   If I do not enable Versioning/Lifecycle, Autoresharding works fine.
  *   If I disable Autoresharding but enable Versioning+Lifecycle, pushing data 
works fine, until I manually reshard. This hangs also.

My lifecycle rule (which shall remove all versions older than 60 days):


{

"Rules": [{

"Status": "Enabled",

"Prefix": "",

"NoncurrentVersionExpiration": {

"NoncurrentDays": 60

},

"Expiration": {

"ExpiredObjectDeleteMarker": true

},

"ID": "expire-60days"

}]

}

I am currently testing with an application containing customer data, but I am 
also creating some random test data to create logs I can share.
I will also test whether the versioning itself is the culprit, or if it is the 
lifecycle rule.

Regards,
Martin

Von: Orit Wasserman 
Datum: Dienstag, 16. Januar 2018 um 18:38
An: Martin Emrich 
Cc: ceph-users 
Betreff: Re: [ceph-users] Bug in RadosGW resharding? Hangs again...

Hi Martin,

On Mon, Jan 15, 2018 at 6:04 PM, Martin Emrich 
> wrote:
Hi!

After having a completely broken radosgw setup due to damaged buckets, I 
completely deleted all rgw pools, and started from scratch.

But my problem is reproducible. After pushing ca. 10 objects into a bucket, 
the resharding process appears to start, and the bucket is now unresponsive.

Sorry to hear that.
Can you share radosgw logs with --debug_rgw=20 --debug_ms=1?

I just see lots of these messages in all rgw logs:

2018-01-15 16:57:45.108826 7fd1779b1700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2018-01-15 16:57:45.119184 7fd1779b1700  0 NOTICE: resharding operation on 
bucket index detected, blocking
2018-01-15 16:57:45.260751 7fd1120e6700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2018-01-15 16:57:45.280410 7fd1120e6700  0 NOTICE: resharding operation on 
bucket index detected, blocking
2018-01-15 16:57:45.300775 7fd15b979700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2018-01-15 16:57:45.300971 7fd15b979700  0 WARNING: set_req_state_err 
err_no=2300 resorting to 500
2018-01-15 16:57:45.301042 7fd15b979700  0 ERROR: 
RESTFUL_IO(s)->complete_header() returned err=Input/output error

One radosgw process and two OSDs housing the bucket index/metadata are still 
busy, but it seems to be stuck again.

How long is this resharding process supposed to take? I cannot believe that an 
application is supposed to block for more than half an hour...

I feel inclined to open a bug report, but I am yet unshure where the problem 
lies.

Some information:

* 3 RGW processes, 3 OSD hosts with 12 HDD OSDs and 6 SSD OSDs
* Ceph 12.2.2
* Auto-Resharding on, Bucket Versioning & Lifecycle rule enabled.

What life cycle rules do you use?

Regards,
Orit
Thanks,

Martin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com