Re: [ceph-users] help me turn off "many more objects that average"
Hi Paul, Yes, all monitors have been restarted. Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] help me turn off "many more objects that average"
Hi all, I'm having trouble turning off the warning "1 pools have many more objects per pg than average". I've tried a lot of variations on the below, my current ceph.conf: #... [mon] #... mon_pg_warn_max_object_skew = 0 All of my monitors have been restarted. Seems like I'm missing something. Syntax error? Wrong section? No vertical blank whitespace allowed? Not supported in Luminous? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how can time machine know difference between cephfs fuse and kernel client?
Hi All, I think my problem was that I had quotas set at multiple levels of a subtree, and maybe some were conflicting. (E.g. Parent said quota=1GB, child said quota=200GB.) I could not reproduce the problem, but setting quotas only on the user's subdirectory and not elsewhere along the way to the root fixed the problem. :) I'm actually using the kernel cephfs for timemacine, plus a .plist file to tell Time Machine not to use too much space. There is an example here: https://www.reddit.com/r/homelab/comments/83vkaz/howto_make_time_machine_backups_on_a_samba/ For windows, I don't know of a way to give it a hint with a file, so I'm using cephfs quotas. As for AFP, there are a few reasons I decided not to use it. We already have an Samba setup with authentication. SMB time machine only works with macOS 10.12 and newer, so if there were time and we had more older clients netatalk would make sense also. Supposedly AFP is going away someday: https://apple.stackexchange.com/questions/285417/is-afp-slated-to-be-removed-from-future-versions-of-macos Thanks for the responses! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] luminous ceph-fuse with quotas breaks 'mount' and 'df'
0670 7f0e300b5140 15 inode.get on 0x556a3f128000 0x12291c4.head now 2 2018-08-17 14:34:55.040672 7f0e300b5140 20 client.18814183 _ll_get 0x556a3f128000 0x12291c4 -> 1 2018-08-17 14:34:55.041035 7f0e300b5140 10 client.18814183 ll_register_callbacks cb 0x556a3ee82c80 invalidate_ino_cb 1 invalidate_dentry_cb 1 switch_interrupt_cb 1 remount_cb 1 2018-08-17 14:34:55.048403 7f0e298a6700 10 client.18814183 put_inode on 0x1.head(faked_ino=0 ref=3 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 mtime=2018-05-16 08:33:31.388505 caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x556a3f128c00) 2018-08-17 14:34:55.048421 7f0e298a6700 15 inode.put on 0x556a3f128c00 0x1.head now 2 2018-08-17 14:34:55.048424 7f0e298a6700 10 client.18814183 put_inode on 0x1.head(faked_ino=0 ref=2 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 mtime=2018-05-16 08:33:31.388505 caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x556a3f128c00) 2018-08-17 14:34:55.048429 7f0e298a6700 15 inode.put on 0x556a3f128c00 0x1.head now 1 2018-08-17 14:34:55.051029 7f0e2589e700 1 client.18814183 using remount_cb 2018-08-17 14:34:55.055053 7f0e2509d700 3 client.18814183 ll_getattr 0x12291c4.head 2018-08-17 14:34:55.055070 7f0e2509d700 10 client.18814183 _getattr mask pAsLsXsFs issued=1 2018-08-17 14:34:55.055074 7f0e2509d700 10 client.18814183 fill_stat on 0x12291c4 snap/devhead mode 040555 mtime 2018-08-15 14:09:02.547890 ctime 2018-08-17 14:28:09.654639 2018-08-17 14:34:55.055089 7f0e2509d700 3 client.18814183 ll_getattr 0x12291c4.head = 0 2018-08-17 14:34:55.055100 7f0e2509d700 3 client.18814183 ll_forget 0x12291c4 1 2018-08-17 14:34:55.055102 7f0e2509d700 20 client.18814183 _ll_put 0x556a3f128000 0x12291c4 1 -> 1 2018-08-17 14:34:55.965416 7f0e2a8a8700 10 client.18814183 renew_caps() 2018-08-17 14:34:55.965432 7f0e2a8a8700 15 client.18814183 renew_caps requesting from mds.0 2018-08-17 14:34:55.965436 7f0e2a8a8700 10 client.18814183 renew_caps mds.0 2018-08-17 14:34:55.965504 7f0e2a8a8700 20 client.18814183 trim_cache size 0 max 16384 2018-08-17 14:34:55.967114 7f0e298a6700 10 client.18814183 handle_client_session client_session(renewcaps seq 2) v1 from mds.0 ceph-fuse[30502]: fuse finished with error 0 and tester_r 0 *** Caught signal (Segmentation fault) ** On 07/09/2018 08:48 AM, John Spray wrote: On Fri, Jul 6, 2018 at 6:30 PM Chad William Seys wrote: Hi all, I'm having a problem that when I mount cephfs with a quota in the root mount point, no ceph-fuse appears in 'mount' and df reports: Filesystem 1K-blocks Used Available Use% Mounted on ceph-fuse 0 0 0- /srv/smb If I 'ls' I see the expected files: # ls -alh total 6.0K drwxrwxr-x+ 1 root smbadmin 18G Jul 5 17:06 . drwxr-xr-x 5 root smbadmin 4.0K Jun 16 2017 .. drwxrwx---+ 1 smbadmin smbadmin 3.0G Jan 18 10:50 bigfix-relay-cache drwxrwxr-x+ 1 smbadmin smbadmin 15G Jul 6 11:51 instr_files drwxrwx---+ 1 smbadmin smbadmin0 Jul 6 11:50 mcdermott-group Quotas are being used: getfattr --only-values -n ceph.quota.max_bytes /srv/smb 1 Turning off the quota at the mountpoint allows df and mount to work correctly. I'm running 12.2.4 on the servers and 12.2.5 on the client. That's pretty weird, not something I recall seeing before. When quotas are in use, Ceph is implementing the same statfs() hook to report usage to the OS, but it's doing a getattr() call to the MDS inside that function. I wonder if something is going slowly, and perhaps the OS is ignoring filesystems that don't return promptly, to avoid hanging "df" on a misbehaving filesystem? I'd debug this by setting "debug ms = 1", and finding the client's log in /var/log/ceph. John Is there a bug report for this? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how can time machine know difference between cephfs fuse and kernel client?
Also, when using cephfs fuse client, Windows File History reports no space free. Free Space: 0 bytes, Total Space: 186 GB. C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how can time machine know difference between cephfs fuse and kernel client?
Hello all, I have used cephfs served over Samba to set up a "time capsule" server. However, I could only get this to work using the cephfs kernel module. Time machine would give errors if cephfs were mounted with fuse. (Sorry, I didn't write down the error messages!) Anyone have an idea how the two methods of mounting are detectable by time machine through Samba? Windows 10 File History behaved the same way. Error messages are "Could not enable File History. There is not enough space on the disk". (Although it shows the correct amount of space.) And "File History doesn't recognize this drive." I'd like to use cephfs fuse for the quota support. (The kernel client is said to support quotas with Mimic and kernel version >= 4.17, but that is to cutting edge for me ATM.) Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs fuse versus kernel performance
Hi all, Anyone know of benchmarks of cephfs through fuse versus kernel? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] luminous ceph-fuse with quotas breaks 'mount' and 'df'
Hi Greg, Am i reading this right that you've got a 1-*byte* quota but have gigabytes of data in the tree? I have no idea what that might do to the system, but it wouldn't totally surprise me if that was messing something up. Since <10KB definitely rounds towards 0... Yeah, that directory only contains subdirectories, and those subdirs have separate quotes set. E.g. getfattr --only-values -n ceph.quota.max_bytes /srv/smb/mcdermott-group/ 2 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] luminous ceph-fuse with quotas breaks 'mount' and 'df'
Hi all, I'm having a problem that when I mount cephfs with a quota in the root mount point, no ceph-fuse appears in 'mount' and df reports: Filesystem 1K-blocks Used Available Use% Mounted on ceph-fuse 0 0 0- /srv/smb If I 'ls' I see the expected files: # ls -alh total 6.0K drwxrwxr-x+ 1 root smbadmin 18G Jul 5 17:06 . drwxr-xr-x 5 root smbadmin 4.0K Jun 16 2017 .. drwxrwx---+ 1 smbadmin smbadmin 3.0G Jan 18 10:50 bigfix-relay-cache drwxrwxr-x+ 1 smbadmin smbadmin 15G Jul 6 11:51 instr_files drwxrwx---+ 1 smbadmin smbadmin0 Jul 6 11:50 mcdermott-group Quotas are being used: getfattr --only-values -n ceph.quota.max_bytes /srv/smb 1 Turning off the quota at the mountpoint allows df and mount to work correctly. I'm running 12.2.4 on the servers and 12.2.5 on the client. Is there a bug report for this? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osds with different disk sizes may killing, > performance (?? ?)
You'll find it said time and time agin on the ML... avoid disks of different sizes in the same cluster. It's a headache that sucks. It's not impossible, it's not even overly hard to pull off... but it's very easy to cause a mess and a lot of headaches. It will also make it harder to diagnose performance issues in the cluster. Not very practical for clusters which aren't new. There is no way to fill up all disks evenly with the same number of Bytes and then stop filling the small disks when they're full and only continue filling the larger disks. This is possible with adjusting crush weights. Initially the smaller drives are weighted more highly than larger drives. As data gets added the weights are changed so that larger drives continue to fill while no drives becomes overfull. What will happen if you are filling all disks evenly with Bytes instead of % is that the small disks will get filled completely and all writes to the cluster will block until you do something to reduce the amount used on the full disks. That means the crush weights were not adjusted correctly as the cluster filled. but in this case you would have a steep drop off of performance. when you reach the fill level where small drives do not accept more data, suddenly you would have a performance cliff where only your larger disks are doing new writes. and only larger disks doing reads on new data. Good point! Although if this is implemented by changing crush weights, adjusting the weights as the cluster fills will cause the data to churn and the new data will not only be assigned to larger drives. :) Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osds with different disk sizes may killing, > performance (?? ?)
You'll find it said time and time agin on the ML... avoid disks of different sizes in the same cluster. It's a headache that sucks. It's not impossible, it's not even overly hard to pull off... but it's very easy to cause a mess and a lot of headaches. It will also make it harder to diagnose performance issues in the cluster. Not very practical for clusters which aren't new. There is no way to fill up all disks evenly with the same number of Bytes and then stop filling the small disks when they're full and only continue filling the larger disks. This is possible with adjusting crush weights. Initially the smaller drives are weighted more highly than larger drives. As data gets added the weights are changed so that larger drives continue to fill while no drives becomes overfull. What will happen if you are filling all disks evenly with Bytes instead of % is that the small disks will get filled completely and all writes to the cluster will block until you do something to reduce the amount used on the full disks. That means the crush weights were not adjusted correctly as the cluster filled. but in this case you would have a steep drop off of performance. when you reach the fill level where small drives do not accept more data, suddenly you would have a performance cliff where only your larger disks are doing new writes. and only larger disks doing reads on new data. Good point! Although if this is implemented by changing crush weights, adjusting the weights as the cluster fills will cause the data to churn and the new data will not only be assigned to larger drives. :) Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osds with different disk sizes may killing performance (?? ?)
Hello, I think your observations suggest that, to a first approximation, filling drives with bytes to the same absolute level is better for performance than filling drives to the same percentage full. Assuming random distribution of PGs, this would cause the smallest drives to be as active as the largest drives. E.g. if every drive had 1TB of data, each would be equally likely to contain the PG of interest. Of course, as more data was added the smallest drives could not hold more and the larger drives become more active, but at least the smaller drives would as active as possible. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd and cephfs (data) in one pool?
Hello, Is it possible to place rbd and cephfs data in the same pool? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] what does associating ceph pool to application do?
Thanks John! I see that a pool can have more than one "application". Should I feel free to combine uses (e.g. cephfs,rbd) or is this counterindicated? Thanks! Chad. Just to stern this up a bit... In the future, you may find that things stop working if you remove the application tags. For example, in Mimic, application tags will be involved in authentication for CephFS, and if you removed the cephfs tags from your data pool, your clients would stop being able to access it. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] what does associating ceph pool to application do?
Scrolled down a bit and found this blog post: https://ceph.com/community/new-luminous-pool-tags/ If things haven't changed: Could someone tell me / link to what associating a ceph pool to an application does? ATM it's a tag and does nothing to the pool/PG/etc structure I hope this info includes why "Disabling an application within a pool might result in loss of application functionality" when running 'ceph osd application disable ' A stern warning to ignore. C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] what does associating ceph pool to application do?
Hi All, Could someone tell me / link to what associating a ceph pool to an application does? I hope this info includes why "Disabling an application within a pool might result in loss of application functionality" when running 'ceph osd application disable ' Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering
Thanks David, When I convert to bluestore and the dust settles I hope to do a same cluster comparison and post here! Chad. On 09/30/2017 07:29 PM, David Turner wrote: > In my case, the replica-3 and k2m2 are stored on the same spinning disks. That is exactly what I meant by same pool. The only way for a cache to make sense would be if the data being written or read will be modified or heavily read for X amount of time and then ignored. If things are rarely read, and randomly so, them prompting then into a cache tier just makes you wait for the object to be promoted to cache before you read it once or twice before it sits in there until it's demoted again. If you have random io and anything can really be read next, then a cache tier on the same disks as the EC pool will only cause things to be promoted and demoted for no apparent reason. You can always test this for your use case and see if it helps enough to create a pool and tier that you need to manage or not. I'm planning to remove my cephfs cache tier once I upgrade to Luminous as I only have it as a requirement. It causes me to show down my writes heavily as eviction io is useless and wasteful of cluster io for me. I haven't checked on the process for that yet, but I'm assuming it's a set command on the pool that will then allow me to disable and remove the cache tier. I mention that because if it is that easy to enable/disable, then testing it should be simple and easy to compare. On Sat, Sep 30, 2017, 8:10 PM Chad William Seys <cws...@physics.wisc.edu <mailto:cws...@physics.wisc.edu>> wrote: Hi David, Thanks for the clarification. Reminded me of some details I forgot to mention. In my case, the replica-3 and k2m2 are stored on the same spinning disks. (Mainly using EC for "compression" b/c with the EC k2m2 setting PG only takes up the same amount of space as a replica-2 while allowing 2 disks to fail like replica-3 without loss.) I'm using this setup as RBDs and cephfs to store things like local mirrors of linux packages and drive images to be broadcast over network. Seems to be about as fast as a normal hard drive. :) So is this the situation where the "cache tier [is] ont the same root of osds as the EC pool"? Thanks for the advice! Chad. On 09/30/2017 12:32 PM, David Turner wrote: > I can only think of 1 type of cache tier usage that is faster if you are > using the cache tier on the same root of osds as the EC pool. That is > cold storage where the file is written initially, modified and read door > the first X hours, and then remains in cold storage for the remainder of > its life with rate reads. > > Other than that there are a few use cases using a faster root of osds > that might make sense, but generally it's still better to utilize that > faster storage in the rest of the osd stack either as journals for > filestore or Wal/DB partitions for bluestore. > > > On Sat, Sep 30, 2017, 12:56 PM Chad William Seys > <cws...@physics.wisc.edu <mailto:cws...@physics.wisc.edu> <mailto:cws...@physics.wisc.edu <mailto:cws...@physics.wisc.edu>>> wrote: > > Hi all, > Now that Luminous supports direct writing to EC pools I was > wondering > if one can get more performance out of an erasure-coded pool with > overwrites or an erasure-coded pool with a cache tier? > I currently have a 3 replica pool in front of a k2m2 erasure coded > pool. Luminous documentation on cache tiering > http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#a-word-of-caution > makes it sound like cache tiering is usually not recommonded. > > Thanks! > Chad. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> <mailto:ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering
Hi David, Thanks for the clarification. Reminded me of some details I forgot to mention. In my case, the replica-3 and k2m2 are stored on the same spinning disks. (Mainly using EC for "compression" b/c with the EC k2m2 setting PG only takes up the same amount of space as a replica-2 while allowing 2 disks to fail like replica-3 without loss.) I'm using this setup as RBDs and cephfs to store things like local mirrors of linux packages and drive images to be broadcast over network. Seems to be about as fast as a normal hard drive. :) So is this the situation where the "cache tier [is] ont the same root of osds as the EC pool"? Thanks for the advice! Chad. On 09/30/2017 12:32 PM, David Turner wrote: I can only think of 1 type of cache tier usage that is faster if you are using the cache tier on the same root of osds as the EC pool. That is cold storage where the file is written initially, modified and read door the first X hours, and then remains in cold storage for the remainder of its life with rate reads. Other than that there are a few use cases using a faster root of osds that might make sense, but generally it's still better to utilize that faster storage in the rest of the osd stack either as journals for filestore or Wal/DB partitions for bluestore. On Sat, Sep 30, 2017, 12:56 PM Chad William Seys <cws...@physics.wisc.edu <mailto:cws...@physics.wisc.edu>> wrote: Hi all, Now that Luminous supports direct writing to EC pools I was wondering if one can get more performance out of an erasure-coded pool with overwrites or an erasure-coded pool with a cache tier? I currently have a 3 replica pool in front of a k2m2 erasure coded pool. Luminous documentation on cache tiering http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#a-word-of-caution makes it sound like cache tiering is usually not recommonded. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering
Hi all, Now that Luminous supports direct writing to EC pools I was wondering if one can get more performance out of an erasure-coded pool with overwrites or an erasure-coded pool with a cache tier? I currently have a 3 replica pool in front of a k2m2 erasure coded pool. Luminous documentation on cache tiering http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#a-word-of-caution makes it sound like cache tiering is usually not recommonded. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mds fails to start after upgrading to 10.2.6
Hi All, After upgrading to 10.2.6 on Debian Jessie, the MDS server fails to start. Below is what is written to the log file from attempted start to failure: Any ideas? I'll probably try rolling back to 10.2.5 in the meantime. Thanks! C. On 03/16/2017 12:48 PM, r...@mds01.hep.wisc.edu wrote: 2017-03-16 12:46:38.063709 7f605e746180 0 set uid:gid to 64045:64045 (ceph:ceph) 2017-03-16 12:46:38.063825 7f605e746180 0 ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f), process ceph-mds, pid 10858 2017-03-16 12:46:39.755982 7f6057b62700 1 mds.mds01.hep.wisc.edu handle_mds_map standby 2017-03-16 12:46:39.898430 7f6057b62700 1 mds.0.4072 handle_mds_map i am now mds.0.4072 2017-03-16 12:46:39.898437 7f6057b62700 1 mds.0.4072 handle_mds_map state change up:boot --> up:replay 2017-03-16 12:46:39.898459 7f6057b62700 1 mds.0.4072 replay_start 2017-03-16 12:46:39.898466 7f6057b62700 1 mds.0.4072 recovery set is 2017-03-16 12:46:39.898475 7f6057b62700 1 mds.0.4072 waiting for osdmap 253396 (which blacklists prior instance) 2017-03-16 12:46:40.227204 7f6052956700 0 mds.0.cache creating system inode with ino:100 2017-03-16 12:46:40.227569 7f6052956700 0 mds.0.cache creating system inode with ino:1 2017-03-16 12:46:40.954494 7f6050d48700 1 mds.0.4072 replay_done 2017-03-16 12:46:40.954526 7f6050d48700 1 mds.0.4072 making mds journal writeable 2017-03-16 12:46:42.211070 7f6057b62700 1 mds.0.4072 handle_mds_map i am now mds.0.4072 2017-03-16 12:46:42.211074 7f6057b62700 1 mds.0.4072 handle_mds_map state change up:replay --> up:reconnect 2017-03-16 12:46:42.211094 7f6057b62700 1 mds.0.4072 reconnect_start 2017-03-16 12:46:42.211098 7f6057b62700 1 mds.0.4072 reopen_log 2017-03-16 12:46:42.211105 7f6057b62700 1 mds.0.server reconnect_clients -- 5 sessions 2017-03-16 12:47:28.502417 7f605535d700 1 mds.0.server reconnect gave up on client.14384220 10.128.198.55:0/2012593454 2017-03-16 12:47:28.505126 7f605535d700 -1 ./include/interval_set.h: In function 'void interval_set::insert(T, T, T*, T*) [with T = inodeno_t]' thread 7f605535d700 time 2017-03-16 12:47:28.502496 ./include/interval_set.h: 355: FAILED assert(0) ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x7f605e248ed2] 2: (()+0x1ea5fe) [0x7f605de245fe] 3: (InoTable::project_release_ids(interval_set&)+0x917) [0x7f605e065ad7] 4: (Server::journal_close_session(Session*, int, Context*)+0x18e) [0x7f605de89f1e] 5: (Server::kill_session(Session*, Context*)+0x133) [0x7f605de8bf23] 6: (Server::reconnect_tick()+0x148) [0x7f605de8d378] 7: (MDSRankDispatcher::tick()+0x389) [0x7f605de524d9] 8: (Context::complete(int)+0x9) [0x7f605de3fcd9] 9: (SafeTimer::timer_thread()+0x104) [0x7f605e239e84] 10: (SafeTimerThread::entry()+0xd) [0x7f605e23ad2d] 11: (()+0x8064) [0x7f605d53d064] 12: (clone()+0x6d) [0x7f605ba8262d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- -263> 2017-03-16 12:46:38.056353 7f605e746180 5 asok(0x7f6068a2a000) register_command perfcounters_dump hook 0x7f6068a06030 -262> 2017-03-16 12:46:38.056425 7f605e746180 5 asok(0x7f6068a2a000) register_command 1 hook 0x7f6068a06030 -261> 2017-03-16 12:46:38.056431 7f605e746180 5 asok(0x7f6068a2a000) register_command perf dump hook 0x7f6068a06030 -260> 2017-03-16 12:46:38.056434 7f605e746180 5 asok(0x7f6068a2a000) register_command perfcounters_schema hook 0x7f6068a06030 -259> 2017-03-16 12:46:38.056437 7f605e746180 5 asok(0x7f6068a2a000) register_command 2 hook 0x7f6068a06030 -258> 2017-03-16 12:46:38.056440 7f605e746180 5 asok(0x7f6068a2a000) register_command perf schema hook 0x7f6068a06030 -257> 2017-03-16 12:46:38.056444 7f605e746180 5 asok(0x7f6068a2a000) register_command perf reset hook 0x7f6068a06030 -256> 2017-03-16 12:46:38.056448 7f605e746180 5 asok(0x7f6068a2a000) register_command config show hook 0x7f6068a06030 -255> 2017-03-16 12:46:38.056457 7f605e746180 5 asok(0x7f6068a2a000) register_command config set hook 0x7f6068a06030 -254> 2017-03-16 12:46:38.056461 7f605e746180 5 asok(0x7f6068a2a000) register_command config get hook 0x7f6068a06030 -253> 2017-03-16 12:46:38.056464 7f605e746180 5 asok(0x7f6068a2a000) register_command config diff hook 0x7f6068a06030 -252> 2017-03-16 12:46:38.056466 7f605e746180 5 asok(0x7f6068a2a000) register_command log flush hook 0x7f6068a06030 -251> 2017-03-16 12:46:38.056469 7f605e746180 5 asok(0x7f6068a2a000) register_command log dump hook 0x7f6068a06030 -250> 2017-03-16 12:46:38.056472 7f605e746180 5 asok(0x7f6068a2a000) register_command log reopen hook 0x7f6068a06030 -249> 2017-03-16 12:46:38.063709 7f605e746180 0 set uid:gid to 64045:64045 (ceph:ceph) -248> 2017-03-16 12:46:38.063825 7f605e746180 0 ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f), process ceph-mds, pid 10858
Re: [ceph-users] removing ceph.quota.max_bytes
Thanks! Seems non-standard, but it works. :) C. Anyone know what's wrong? You can clear these by setting them to zero. John Everything is Jewel 10.2.5. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] removing ceph.quota.max_bytes
Hi All, I'm trying to remove the extended attribute "ceph.quota.max_bytes" on a cephfs directory. I've fuse mounted a subdirectory of a cephfs filesystem under /ceph/cephfs . Next I set "ceph.quota.max_bytes" setfattr -n ceph.quota.max_bytes -v 123456 /ceph/cephfs And check the attribute: getfattr -n ceph.quota.max_bytes /ceph/cephfs getfattr: Removing leading '/' from absolute path names # file: ceph/cephfs ceph.quota.max_bytes="123456" Then I try to remove: setfattr -x ceph.quota.max_bytes /ceph/cephfs/ setfattr: /ceph/cephfs/: No such attribute Anyone know what's wrong? Everything is Jewel 10.2.5. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 10.2.5 on Jessie?
Thanks ceph@jack and Alexandre for the reassurance! C. On 12/20/2016 08:37 PM, Alexandre DERUMIER wrote: I have upgrade 3 jewel cluster on jessie to last 10.2.5, works fine. - Mail original - De: "Chad William Seys" <cws...@physics.wisc.edu> À: "ceph-users" <ceph-us...@ceph.com> Envoyé: Mardi 20 Décembre 2016 17:31:49 Objet: [ceph-users] 10.2.5 on Jessie? Hi all, Has anyone had success/problems with 10.2.5 on Jessie? I'm being a little cautious before updating. ;) Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 10.2.5 on Jessie?
Hi all, Has anyone had success/problems with 10.2.5 on Jessie? I'm being a little cautious before updating. ;) Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] new feature: auto removal of osds causing "stuck inactive"
Hi all, I recently encountered a situation where some partially removed OSDs caused my cluster to enter a "stuck inactive" state. The eventually solution was to tell ceph the OSDs were "lost". Because all the PGs were replicated elsewhere on the cluster, no data was lost. Would it make sense or be possible for Ceph to automatically detect this situation ("stuck inactive" and PGs replicated elsewhere) and automatically take action to un-stuck the cluster? E.g. automatically mark the OSD as lost or cause the OSD be down and out to have the same effect? Ideally anything that can be safely automated should be. :) Thanks! C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Blocked ops, OSD consuming memory, hammer
Hi Heath, My OSDs do the exact same thing - consume lots of RAM when the cluster is reshuffling OSDs. Try ceph tell osd.* heap release as a cron job. Here's a bug: http://tracker.ceph.com/issues/12681 Chad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] is 0.94.7 packaged well for Debian Jessie
Thanks! Hammer don't use systemd unit files, so it's working fine. (jewel/infernalis still missing systemd .target files) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] is 0.94.7 packaged well for Debian Jessie
Hi All, Has anyone tested 0.94.7 on Debian Jessie? I've heard that the most recent Jewel releases for Jessie were missing pieces (systemd files) so I am a little more hesitant than usual. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RE; upgraded to Ubuntu 16.04, getting assert failure
Hi Don, I had a similar problem starting a mon. In my case a computer failed and I removed and recreated the 3rd mon on a new computer. It would start but never get added to the other mon's lists. Restarting the other two mons caused them to add the third to their monmap . Good luck! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy not in debian repo?
Hi all, I cannot find ceph-deploy in the debian catalogs. I have these in my sources: deb http://ceph.com/debian-hammer/ jessie main # ceph-deploy not yet in jessie repo deb http://ceph.com/debian-hammer wheezy main I also see ceph-deploy in the repo. http://download.ceph.com/debian/pool/main/c/ceph-deploy/ So, not listed in the Contents files ? Thanks, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] copying files from one pool to another results in more free space?
Hi All I'm observing some weird behavior in the amount of space ceph reports while copying files from an rbd image in one pool to an rbd image in another. The AVAIL number reported by 'ceph df' goes up as the copy proceeds rather than goes down! The output of 'ceph df' shows that the AVAIL space is 9219G initially and then after the copy has proceeded for some time shows 9927G. (See below.) Ceph is somehow reclaiming ~700GB of space when it should only be losing space! I like it! :) Some details: I ran fstrim on the mount points of both source and destinatino mount points. The data is being copied from tibs/tibs-ecpool to 3-replica/3-replica-ec. tibs is a 3 replica pool which is backed by tibs-ecpool which is a k2m2 erasure coded pool. 3-replica/3-replica-ec is the same arrangement, but with fewer PGs. No data is being deleted. I see that USED in 3-replica/3-replica-ec is going up as expected. But the USED in both tibs/tibs-ecpool is going down. This appears to be from where the space is being reclaimed. Question: why is a read of data causing it to take up less space? Thanks! Chad. # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 22908G 9219G 13689G 59.76 POOLS: NAMEID USED %USED MAX AVAIL OBJECTS rbd 13281 0 1990G 3 tibs22 72724M 0.31 1990G 612278 tibs-ecpool 23 4555G 19.88 2985G 1166644 cephfs_data 27 8 0 2985G 2 cephfs_metadata 28 34800k 0 1990G 28 3-replica 31 745G 3.25 1990G 5516809 3-replica-ec32 942G 4.12 2985G 241626 copy progresses - # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 22908G 9927G 12980G 56.66 POOLS: NAMEID USED %USED MAX AVAIL OBJECTS rbd 13281 0 2284G 3 tibs22 68456M 0.29 2284G 88561 tibs-ecpool 23 4227G 18.45 3427G 1082734 cephfs_data 27 8 0 3427G 2 cephfs_metadata 28 34832k 0 2284G 28 3-replica 31 745G 3.25 2284G 2676207 3-replica-ec32 945G 4.13 3427G 242194 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Correct method to deploy on jessie
> Most users in the apt family have deployed on Ubuntu > though, and that's what our tests run on, fyi. That is good to know - I wouldn't be surprised if the same packages could be used in Ubuntu and Debian. Especially if the release dates of the Ubuntu and Debian versions were similar. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Correct method to deploy on jessie
Hi Dmitry, You might try using the wheezy repos on jessie. Often this will work. (I'm using wheezy for most of my ceph nodes, but not two of the three monitor nodes, which are jessie with wheezy repos.) # Wheezy repos on Jessie deb http://ceph.com/debian-hammer/ wheezy main Alternatively Jessie repo: deb http://gitbuilder.ceph.com/ceph-deb-jessie-x86_64-basic/ref/v0.94.3 jessie main But read this thread for tips: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-September/004142.html E.g. > The thing is: whatever I write into ceph.list, ceph-deploy just > overwrites it with "deb http://ceph.com/debian-hammer/ jessie main" > which does not exist :( [...] > you can specify any repository you like with 'ceph-deploy install > --repo-url ', given you have the repo keys installed. Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant
> note that I've only did it after most of pg were recovered My guess / hope is that heap free would also help during the recovery process. Recovery causing failures does not seem like the best outcome. :) C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant
> Going from 2GB to 8GB is not normal, although some slight bloating is > expected. If I recall correctly, Mariusz's cluster had a period of flapping OSDs? I experienced a a similar situation using hammer. My OSDs went from 10GB in RAM in a Healthy state to 24GB RAM + 10GB swap in a recovering state. I also could not re-add a node b/c every time I tried OOM killer would kill an OSD daemon somewhere before the cluster could become healthy again. Therefore I propose we begin expecting bloating under these circumstances. :) > In your case it just got much worse than usual for reasons yet > unknown. Not really unknown: B/c 'ceph tell osd.* heap release' freed RAM for Mariusz, I think we know the reason for so much RAM use is b/c of tcmalloc not freeing unused memory. Right? Here is a related "urgent" and "won't fix" bug to which applies http://tracker.ceph.com/issues/12681 . Sage suggests making the heap release command a cron job . :) Have fun! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant
On Tuesday, September 08, 2015 18:28:48 Shinobu Kinjo wrote: > Have you ever? > > http://ceph.com/docs/master/rados/troubleshooting/memory-profiling/ No. But the command 'ceph tell osd.* heap release' did cause my OSDs to consume the "normal" amount of RAM. ("normal" in this case means the same amount of RAM as before my cluster went through a recovery phase. Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RAM usage only very slowly decreases after cluster recovery
Thanks Somnath! I found a bug in the tracker to follow: http://tracker.ceph.com/issues/12681 Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant
Does 'ceph tell osd.* heap release' help with OSD RAM usage? From http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003932.html Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RAM usage only very slowly decreases after cluster recovery
Thanks! 'ceph tell osd.* heap release' seems to have worked! Guess I'll sprinkle it around my maintenance scripts. Somnath Is there a plan to make jemalloc standard in Ceph in the future? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RAM usage only very slowly decreases after cluster recovery
Hi all, It appears that OSD daemons only very slowly free RAM after an extended period of an unhealthy cluster (shuffling PGs around). Prior to a power outage (and recovery) around July 25th, the amount of RAM used was fairly constant, at most 10GB (out of 24GB). You can see in the attached PNG osd6_stack2.png (Week 30) that the amount of used RAM on osd06.physics.wisc.edu was holding steady around 7GB. Around July 25th our Ceph cluster rebooted after a power outage. Not all nodes booted successfully, so Ceph proceeded to shuffle PGs to attempt to return to health with the renaming nodes. You can see in osd6_stack2.png two purplish spikes showing that the node used around 10GB swap space during the recovery period. Finally the cluster recovered around July 31st. During that period some I had to take some osd daemons out of the pool b/c their nodes ran out of swap space and the daemons were killed by the out of memory (OOM) kernel feature. (The recovery period was probably extended by me trying to add the daemons/drives back. If I recall correctly that is what was occurring during the second swap peak.) This RAM usage pattern is in generalthe same for all the nodes in the cluster. Almost three weeks later, the amount of RAM used on the node is still decreasing, but it has not returned to pre-power outage levels. 15GB instead of 7GB. Why is Ceph using 2x more RAM than it used to in steady state? Thanks, Chad. (P.S. It is really unfortunate that Ceph uses more RAM when recovering - can lead to cascading failure!)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] TRIM / DISCARD run at low priority by the OSDs?
Hi Alexandre, Thanks for the note. I was not clear enough. The fstrim I was running was only on the krbd mountpoints. The backend OSDs only have standard hard disks, not SSDs, so they don't need to be trimmed. Instead I was reclaiming free space as reported by Ceph. Running fstrim on the rbd mountpoints this caused the OSDs to become very busy, affecting all rbds, not just those being trimmed. I was hoping someone had an idea of how to make the OSDs not become busy while running fstrim on the rbd mountpoints. E.g. if Ceph made a distinction between trim operations on RBDs and other types, it could give those operations lower priority. Thanks again! Chad. On Monday, August 24, 2015 18:26:30 you wrote: Hi, I'm not sure for krbd, but with librbd, using trim/discard on the client, don't do trim/discard on the osd physical disk. It's simply write zeroes in the rbd image. zeores write can be skipped since this commit (librbd related) https://github.com/xiaoxichen/ceph/commit/e7812b8416012141cf8faef577e7b27e1b 29d5e3 +OPTION(rbd_skip_partial_discard, OPT_BOOL, false) Then you can still manage fstrim manually on the osd servers - Mail original - De: Chad William Seys cws...@physics.wisc.edu À: ceph-users ceph-us...@ceph.com Envoyé: Samedi 22 Août 2015 04:26:38 Objet: [ceph-users] TRIM / DISCARD run at low priority by the OSDs? Hi All, Is it possible to give TRIM / DISCARD initiated by krbd low priority on the OSDs? I know it is possible to run fstrim at Idle priority on the rbd mount point, e.g. ionice -c Idle fstrim -v $MOUNT . But this Idle priority (it appears) only is within the context of the node executing fstrim . If the node executing fstrim is Idle then the OSDs are very busy and performance suffers. Is it possible to tell the OSD daemons (or whatever) to perform the TRIMs at low priority also? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] TRIM / DISCARD run at low priority by the OSDs?
Hi All, Is it possible to give TRIM / DISCARD initiated by krbd low priority on the OSDs? I know it is possible to run fstrim at Idle priority on the rbd mount point, e.g. ionice -c Idle fstrim -v $MOUNT . But this Idle priority (it appears) only is within the context of the node executing fstrim . If the node executing fstrim is Idle then the OSDs are very busy and performance suffers. Is it possible to tell the OSD daemons (or whatever) to perform the TRIMs at low priority also? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] why are there degraded PGs when adding OSDs?
Hi Sam, The pg might also be degraded right after a map change which changes the up/acting sets since the few objects updated right before the map change might be new on some replicas and old on the other replicas. While in that state, those specific objects are degraded, and the pg would report degraded until they are recovered (which would happen asap, prior to backfilling the new replica). -Sam That sounds like only a few PGs should be degraded. I instead have about 45% (and higher earlier). # ceph -s cluster 7797e50e-f4b3-42f6-8454-2e2b19fa41d6 health HEALTH_WARN 2081 pgs backfill 6745 pgs degraded 17 pgs recovering 6728 pgs recovery_wait 6745 pgs stuck degraded 8826 pgs stuck unclean recovery 2530124/5557452 objects degraded (45.527%) recovery 33594/5557452 objects misplaced (0.604%) monmap e5: 3 mons at {mon01=128.104.164.197:6789/0,mon02=128.104.164.198:6789/0,mon03=10.128.198.51:6789/0} election epoch 16458, quorum 0,1,2 mon03,mon01,mon02 mdsmap e3032: 1/1/1 up {0=mds01.hep.wisc.edu=up:active} osdmap e149761: 27 osds: 27 up, 27 in; 2083 remapped pgs pgmap v13464928: 18432 pgs, 9 pools, 5401 GB data, 1364 kobjects 11122 GB used, 11786 GB / 22908 GB avail 2530124/5557452 objects degraded (45.527%) 33594/5557452 objects misplaced (0.604%) 9606 active+clean 6726 active+recovery_wait+degraded 2081 active+remapped+wait_backfill 17 active+recovering+degraded 2 active+recovery_wait+degraded+remapped recovery io 24861 kB/s, 6 objects/s Chad. - Original Message - From: Chad William Seys cws...@physics.wisc.edu To: ceph-users ceph-us...@ceph.com Sent: Monday, July 27, 2015 12:27:26 PM Subject: [ceph-users] why are there degraded PGs when adding OSDs? Hi All, I recently added some OSDs to the Ceph cluster (0.94.2). I noticed that 'ceph -s' reported both misplaced AND degraded PGs. Why should any PGs become degraded? Seems as though Ceph should only be reporting misplaced PGs? From the Giant release notes: Degraded vs misplaced: the Ceph health reports from ‘ceph -s’ and related commands now make a distinction between data that is degraded (there are fewer than the desired number of copies) and data that is misplaced (stored in the wrong location in the cluster). The distinction is important because the latter does not compromise data safety. Does Ceph delete some replicas of the PGs (leading to degradation) before re- replicating on the new OSD? This does not seem to be the safest algorithm. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] why are there degraded PGs when adding OSDs?
Hi All, I recently added some OSDs to the Ceph cluster (0.94.2). I noticed that 'ceph -s' reported both misplaced AND degraded PGs. Why should any PGs become degraded? Seems as though Ceph should only be reporting misplaced PGs? From the Giant release notes: Degraded vs misplaced: the Ceph health reports from ‘ceph -s’ and related commands now make a distinction between data that is degraded (there are fewer than the desired number of copies) and data that is misplaced (stored in the wrong location in the cluster). The distinction is important because the latter does not compromise data safety. Does Ceph delete some replicas of the PGs (leading to degradation) before re- replicating on the new OSD? This does not seem to be the safest algorithm. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] why are there degraded PGs when adding OSDs?
Hi Sam, I'll need help getting the osdmap and pg dump prior to addition. I can remove the OSDs and add again if the osdmap (etc.) is not logged somewhere. Chad. Hmm, that's odd. Can you attach the osdmap and ceph pg dump prior to the addition (with all pgs active+clean), then the osdmap and ceph pg dump afterwards? -Sam - Original Message - From: Chad William Seys cws...@physics.wisc.edu To: Samuel Just sj...@redhat.com, ceph-users ceph-us...@ceph.com Sent: Monday, July 27, 2015 12:57:23 PM Subject: Re: [ceph-users] why are there degraded PGs when adding OSDs? Hi Sam, The pg might also be degraded right after a map change which changes the up/acting sets since the few objects updated right before the map change might be new on some replicas and old on the other replicas. While in that state, those specific objects are degraded, and the pg would report degraded until they are recovered (which would happen asap, prior to backfilling the new replica). -Sam That sounds like only a few PGs should be degraded. I instead have about 45% (and higher earlier). # ceph -s cluster 7797e50e-f4b3-42f6-8454-2e2b19fa41d6 health HEALTH_WARN 2081 pgs backfill 6745 pgs degraded 17 pgs recovering 6728 pgs recovery_wait 6745 pgs stuck degraded 8826 pgs stuck unclean recovery 2530124/5557452 objects degraded (45.527%) recovery 33594/5557452 objects misplaced (0.604%) monmap e5: 3 mons at {mon01=128.104.164.197:6789/0,mon02=128.104.164.198:6789/0,mon03=10.128.198. 51:6789/0} election epoch 16458, quorum 0,1,2 mon03,mon01,mon02 mdsmap e3032: 1/1/1 up {0=mds01.hep.wisc.edu=up:active} osdmap e149761: 27 osds: 27 up, 27 in; 2083 remapped pgs pgmap v13464928: 18432 pgs, 9 pools, 5401 GB data, 1364 kobjects 11122 GB used, 11786 GB / 22908 GB avail 2530124/5557452 objects degraded (45.527%) 33594/5557452 objects misplaced (0.604%) 9606 active+clean 6726 active+recovery_wait+degraded 2081 active+remapped+wait_backfill 17 active+recovering+degraded 2 active+recovery_wait+degraded+remapped recovery io 24861 kB/s, 6 objects/s Chad. - Original Message - From: Chad William Seys cws...@physics.wisc.edu To: ceph-users ceph-us...@ceph.com Sent: Monday, July 27, 2015 12:27:26 PM Subject: [ceph-users] why are there degraded PGs when adding OSDs? Hi All, I recently added some OSDs to the Ceph cluster (0.94.2). I noticed that 'ceph -s' reported both misplaced AND degraded PGs. Why should any PGs become degraded? Seems as though Ceph should only be reporting misplaced PGs? From the Giant release notes: Degraded vs misplaced: the Ceph health reports from ‘ceph -s’ and related commands now make a distinction between data that is degraded (there are fewer than the desired number of copies) and data that is misplaced (stored in the wrong location in the cluster). The distinction is important because the latter does not compromise data safety. Does Ceph delete some replicas of the PGs (leading to degradation) before re- replicating on the new OSD? This does not seem to be the safest algorithm. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] why are there degraded PGs when adding OSDs?
Hi Sam, I think I may have the problem: I noticed that the new host was created with straw2 instead of straw. Would this account for 50% of PGs being degraded? (I'm removing the OSDs on that host and will recreate with 'firefly' tunables.) Thanks! Chad. On Monday, July 27, 2015 15:09:21 Chad William Seys wrote: Hi Sam, I'll need help getting the osdmap and pg dump prior to addition. I can remove the OSDs and add again if the osdmap (etc.) is not logged somewhere. Chad. Hmm, that's odd. Can you attach the osdmap and ceph pg dump prior to the addition (with all pgs active+clean), then the osdmap and ceph pg dump afterwards? -Sam - Original Message - From: Chad William Seys cws...@physics.wisc.edu To: Samuel Just sj...@redhat.com, ceph-users ceph-us...@ceph.com Sent: Monday, July 27, 2015 12:57:23 PM Subject: Re: [ceph-users] why are there degraded PGs when adding OSDs? Hi Sam, The pg might also be degraded right after a map change which changes the up/acting sets since the few objects updated right before the map change might be new on some replicas and old on the other replicas. While in that state, those specific objects are degraded, and the pg would report degraded until they are recovered (which would happen asap, prior to backfilling the new replica). -Sam That sounds like only a few PGs should be degraded. I instead have about 45% (and higher earlier). # ceph -s cluster 7797e50e-f4b3-42f6-8454-2e2b19fa41d6 health HEALTH_WARN 2081 pgs backfill 6745 pgs degraded 17 pgs recovering 6728 pgs recovery_wait 6745 pgs stuck degraded 8826 pgs stuck unclean recovery 2530124/5557452 objects degraded (45.527%) recovery 33594/5557452 objects misplaced (0.604%) monmap e5: 3 mons at {mon01=128.104.164.197:6789/0,mon02=128.104.164.198:6789/0,mon03=10.128.19 8. 51:6789/0} election epoch 16458, quorum 0,1,2 mon03,mon01,mon02 mdsmap e3032: 1/1/1 up {0=mds01.hep.wisc.edu=up:active} osdmap e149761: 27 osds: 27 up, 27 in; 2083 remapped pgs pgmap v13464928: 18432 pgs, 9 pools, 5401 GB data, 1364 kobjects 11122 GB used, 11786 GB / 22908 GB avail 2530124/5557452 objects degraded (45.527%) 33594/5557452 objects misplaced (0.604%) 9606 active+clean 6726 active+recovery_wait+degraded 2081 active+remapped+wait_backfill 17 active+recovering+degraded 2 active+recovery_wait+degraded+remapped recovery io 24861 kB/s, 6 objects/s Chad. - Original Message - From: Chad William Seys cws...@physics.wisc.edu To: ceph-users ceph-us...@ceph.com Sent: Monday, July 27, 2015 12:27:26 PM Subject: [ceph-users] why are there degraded PGs when adding OSDs? Hi All, I recently added some OSDs to the Ceph cluster (0.94.2). I noticed that 'ceph -s' reported both misplaced AND degraded PGs. Why should any PGs become degraded? Seems as though Ceph should only be reporting misplaced PGs? From the Giant release notes: Degraded vs misplaced: the Ceph health reports from ‘ceph -s’ and related commands now make a distinction between data that is degraded (there are fewer than the desired number of copies) and data that is misplaced (stored in the wrong location in the cluster). The distinction is important because the latter does not compromise data safety. Does Ceph delete some replicas of the PGs (leading to degradation) before re- replicating on the new OSD? This does not seem to be the safest algorithm. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] kernel version for rbd client and hammer tunables
Hi Ilya and all, Thanks for explaining. I'm confused about what building a crushmap means. After running #ceph osd crush tunables hammer data migrated around the cluster, so something changed. I was expecting that 'straw' would be replaced by 'straw2'. (Unfortunately I did not dump the crushmap prior to setting tunables to hammer, so I don't know what change did occur.) So I guess setting tunables to hammer is not building a crushmap. Could you give examples? Would creating a new pool on the cluster now use straw2? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] kernel version for rbd client and hammer tunables
No, pools use crush rulesets. straw and straw2 are bucket types (or algorithms). As an example, if you do ceph osd crush add-bucket foo rack on a cluster with firefly tunables, you will get a new straw bucket. The same after doing ceph osd crush tunables hammer will get you a new straw2 bucket, with the rest of your buckets remaining unaffected. straw buckets are not going to be replaced with straw2 buckets, that's something you as an administrator can make a choice to do. Ah, I see now that 'alg straw' in my crushmap is in the osdXX groups. What happens if I run #ceph osd crush tunables hammer and then add a new OSD? Thanks again! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] kernel version for rbd client and hammer tunables
Hi Ilya and all, Is it safe to use kernel 3.16.7 rbd with Hammer tunables? I've tried this on a test Hammer cluster and the client seems to work fine. I've also mounted cephfs on a Hammer cluster (and Hammer tunables) using kernel 3.16. It seems to work fine (but not much testing). I remember recently someone asking about mounting cephfs with Hammer tunables and it was stated that kernel 4.1 was needed for mounting. Is the more correct statement that mounting is possible but problems will exist? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to display client io in hammer
Hi all, Looks like in Hammer 'ceph -s' no longer displays client IO and ops. How does one display that these days? Thanks, C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to display client io in hammer
Ooops! Turns out I forgot to mount the ceph rbd, so no client IO displayed! C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Kernel version for CephFS client ?
Hi Florent, Most likely Debian will release backported kernels for Jessie, as they have for Wheezy. E.g. Wheezy has had kernel 3.16 backported to it: https://packages.debian.org/search?suite=wheezy-backportssearchon=nameskeywords=linux-image-amd64 C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph.com documentation suggestions
Hi, I've recently seen some confusion over the number of PGs per pool versus per cluster on the mailing list. I also set too many PGs per pool b/c of this confusion. IMO, it is fairly confusing to talk about PGs on the Pool page, but only vaguely talk about the number of PGs for the cluster. Here are some examples of confusing statements with suggested alternatives from the online docs: http://ceph.com/docs/master/rados/operations/pools/ A typical configuration uses approximately 100 placement groups per OSD to provide optimal balancing without using up too many computing resources. - A typical configuration uses approximately 100 placement groups per OSD for all pools in the cluster to provide optimal balancing without using up too many computing resources. http://ceph.com/docs/master/rados/operations/placement-groups/ It is mandatory to choose the value of pg_num because it cannot be calculated automatically. Here are a few values commonly used: - It is mandatory to choose the value of pg_num. pg_num depends on the planned number of pools in the cluster. It cannot be determined automatically on pool creation. Please use this calculator: http://ceph.com/pgcalc/; Thanks! C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] advantages of multiple pools?
Hi All, What are the advantages of having multiple ceph pools (if they use the whole cluster)? Thanks! C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph on Debian Jessie stopped working
Hi Greg, Thanks for the reply. After looking more closely at /etc/ceph/rbdmap I discovered it was corrupted. That was the only problem. I think the dmesg line 'rbd: no image name provided' is also a clue to this! Hope that helps any other newbies! :) Thanks again, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1
Now I also know I have too many PGs! It is fairly confusing to talk about PGs on the Pool page, but only vaguely talk about the number of PGs for the cluster. Here are some examples of confusing statements with suggested alternatives from the online docs: http://ceph.com/docs/master/rados/operations/pools/ A typical configuration uses approximately 100 placement groups per OSD to provide optimal balancing without using up too many computing resources. - A typical configuration uses approximately 100 placement groups per OSD for all pools to provide optimal balancing without using up too many computing resources. http://ceph.com/docs/master/rados/operations/placement-groups/ It is mandatory to choose the value of pg_num because it cannot be calculated automatically. Here are a few values commonly used: - It is mandatory to choose the value of pg_num. Because pg_num depends on the planned number of pools in the cluster, it cannot be determined automatically on pool creation. Please use this calculator: http://ceph.com/pgcalc/; Thanks! C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph on Debian Jessie stopped working
Hi All, Earlier ceph on Debian Jessie was working. Jessie is running 3.16.7 . Now when I modprobe rbd , no /dev/rbd appear. # dmesg | grep -e rbd -e ceph [ 15.814423] Key type ceph registered [ 15.814461] libceph: loaded (mon/osd proto 15/24) [ 15.831092] rbd: loaded [ 22.084573] rbd: no image name provided [ 22.230176] rbd: no image name provided Some files appear under /sys ls /sys/devices/rbd power uevent ceph-fuse /mnt/cephfs just hangs. I haven't changed the ceph config, but possibly there were package updates. I did install a earlier Jessie kernel from a machine which is still working and rebooted. No luck. Any ideas of what to check next? Thanks, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] adding a new pool causes old pool warning pool x has too few pgs
Weird: After a few hours, health check comes back OK without changing the number of PGS for any pools ! Hi All, To a Healthy cluster I recently added two pools to ceph, 1 replicated and 1 ecpool. Then I made the replicated pool into a cache for the ecpool. Afterwards ceph health check started complaining about a preexisting pool having too few pgs. Previous to adding the new pools there was no warning. Why does adding new pools cause an old pool to have too few pgs? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG stuck unclean for long time
Anyone know what is going on with this PG? # ceph health detail HEALTH_WARN 1 pgs stuck unclean; recovery 735/4844641 objects degraded (0.015%); 245/1296706 unfound (0.019%) pg 21.fd is stuck unclean for 349777.229468, current state active, last acting [19,5,15,25] recovery 735/4844641 objects degraded (0.015%); 245/1296706 unfound (0.019%) # ceph pg 21.fd query (output attached) Cluster history: two OSD on the same host were lost. Failure domain is host, so any PG with replicas 1 should not have been lost. The PG is from an erasure coded pool with k=2, m=2 . # ceph osd erasure-code-profile get k2m2 directory=/usr/lib/ceph/erasure-code k=2 m=2 plugin=jerasure ruleset-failure-domain=host technique=reed_sol_van Thanks for any insight! Chad.{ state: active+recovering, snap_trimq: [], epoch: 135269, up: [ 19, 5, 15, 25], acting: [ 19, 5, 15, 25], actingbackfill: [ 5(1), 15(2), 19(0), 25(3)], info: { pgid: 21.fds0, last_update: 134698'529, last_complete: 0'0, log_tail: 0'0, last_user_version: 6075, last_backfill: MAX, purged_snaps: [], history: { epoch_created: 130437, last_epoch_started: 135269, last_epoch_clean: 131847, last_epoch_split: 0, same_up_since: 135260, same_interval_since: 135267, same_primary_since: 135267, last_scrub: 0'0, last_scrub_stamp: 2015-01-22 17:00:57.846599, last_deep_scrub: 0'0, last_deep_scrub_stamp: 2015-01-22 17:00:57.846599, last_clean_scrub_stamp: 0.00}, stats: { version: 134698'529, reported_seq: 2245, reported_epoch: 135269, state: active, last_fresh: 2015-02-05 05:49:18.478995, last_change: 2015-02-05 05:49:18.478995, last_active: 2015-02-05 05:49:18.478995, last_clean: 2015-02-01 08:21:29.833949, last_became_active: 0.00, last_unstale: 2015-02-05 05:49:18.478995, mapping_epoch: 135266, log_start: 0'0, ondisk_log_start: 0'0, created: 130437, last_epoch_clean: 131847, parent: 0.0, parent_split_bits: 0, last_scrub: 0'0, last_scrub_stamp: 2015-01-22 17:00:57.846599, last_deep_scrub: 0'0, last_deep_scrub_stamp: 2015-01-22 17:00:57.846599, last_clean_scrub_stamp: 0.00, log_size: 529, ondisk_log_size: 529, stats_invalid: 0, stat_sum: { num_bytes: 2218786816, num_objects: 529, num_object_clones: 0, num_object_copies: 2116, num_objects_missing_on_primary: 245, num_objects_degraded: 735, num_objects_unfound: 245, num_objects_dirty: 529, num_whiteouts: 0, num_read: 0, num_read_kb: 0, num_write: 529, num_write_kb: 2166784, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 0, num_bytes_recovered: 0, num_keys_recovered: 0, num_objects_omap: 0, num_objects_hit_set_archive: 0}, stat_cat_sum: {}, up: [ 19, 5, 15, 25], acting: [ 19, 5, 15, 25], up_primary: 19, acting_primary: 19}, empty: 0, dne: 0, incomplete: 0, last_epoch_started: 135269, hit_set_history: { current_last_update: 0'0, current_last_stamp: 0.00, current_info: { begin: 0.00, end: 0.00, version: 0'0}, history: []}}, peer_info: [ { peer: 5(1), pgid: 21.fds1, last_update: 134698'529, last_complete: 0'0, log_tail: 0'0, last_user_version: 6075, last_backfill: MAX, purged_snaps: [], history: { epoch_created: 130437, last_epoch_started: 135269, last_epoch_clean: 131847, last_epoch_split: 0, same_up_since: 135260, same_interval_since: 135267, same_primary_since: 135267, last_scrub: 0'0, last_scrub_stamp: 2015-01-22 17:00:57.846599, last_deep_scrub: 0'0, last_deep_scrub_stamp: 2015-01-22 17:00:57.846599, last_clean_scrub_stamp: 0.00}, stats: { version: 134698'529, reported_seq: 1685, reported_epoch: 135255, state: peering, last_fresh: 2015-02-04 16:43:12.517163, last_change: 2015-02-04
[ceph-users] PG to pool mapping?
Hi all, How do I determine which pool a PG belongs to? (Also, is it the case that all objects in a PG belong to one pool?) Thanks! C. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] verifying tiered pool functioning
Hi Zhang, Thanks for the pointer. That page looks like the commands to set up the cache, not how to verify that it is working. I think I have been able to see objects (not PGs I guess) moving from the cache pool to the storage pool using 'rados df' . (I haven't run long enough to verify yet.) Thanks again! Chad. On Tuesday, January 27, 2015 03:47:53 you wrote: Do you mean cache tiering? You can refer to http://ceph.com/docs/master/rados/operations/cache-tiering/ for detail command line. PGs won't migrate from pool to pool. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Chad William Seys Sent: Thursday, January 22, 2015 5:40 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] verifying tiered pool functioning Hello, Could anyone provide a howto verify that a tiered pool is working correctly? E.g. Command to watch as PG migrate from one pool to another? (Or determine which pool a PG is currently in.) Command to see how much data is in each pool (global view of number of PGs I guess)? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cache pool and storage pool: possible to remove storage pool?
Hi all, Documentation explains how to remove the cache pool: http://ceph.com/docs/master/rados/operations/cache-tiering/ Anyone know how to remove the storage pool instead? (E.g. the storage pool has wrong parameters.) I was hoping to push all the objects into the cache pool and then replace storage pool. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure coded pool why ever k1?
Hi Loic, The size of each chunk is object size / K. If you have K=1 and M=2 it will be the same as 3 replicas with none of the advantages ;-) Interesting! I did not see this explained so explicitly. So is the general explanation of k and m something like: k, m: fault tolerance of m+1 replicas, space of 1/k*(m+k) replicas, plus slowness ? So one should never bother with k=1 b/c: k=1, m: fault tolerance of m+1, space of m+1 replicas, plus slowness. (therefore, just use m+1 replicas!) but k=2, m=1: might be useful instead of 2 replicas b/c it has fault tolerance of 2 replicas, space of 1/2*(1+2) = 3/2 = 1.5 replicas, plus slowness. And k=2, m=2: which should be as tolerant as 3 replicas, but take up as much space as (1/2)*(2+2)=2 replicas (right?). Thanks again! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to remove storage tier
Hi all, I've got a tiered pool arrangement with a replicated pool and an erasure pool. I set it up such that the replicated pool is in front of the erasure coded pool. I know want to change the properties of the erasure coded pool. Is there a way of changing switching which erasure profile is used in the ec pool? (It looks possible to change the properties of the erasure profile using the --force option, but that is noted to be EXTREMELY DANGEROUS.) Possibly safer would be to push all the objects from the ec pool into the replicated pool, de-tier the pools, delete the ec pool, create a new ec pool with new properties, then re-tier the pools. Unfortunately, the documentation I find talks about how to drain the cache pool (replicated pool) rather than the other way around. Any ideas? Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] erasure coded pool why ever k1?
Hello all, What reasons would one want k1? I read that m determines the number of OSD which can fail before loss. But I don't see explained how to choose k. Any benefits for choosing k1? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] verifying tiered pool functioning
Hello, Could anyone provide a howto verify that a tiered pool is working correctly? E.g. Command to watch as PG migrate from one pool to another? (Or determine which pool a PG is currently in.) Command to see how much data is in each pool (global view of number of PGs I guess)? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph PG Incomplete = Cluster unusable
Hi Christian, I had a similar problem about a month ago. After trying lots of helpful suggestions, I found none of it worked and I could only delete the affected pools and start over. I opened a feature request in the tracker: http://tracker.ceph.com/issues/10098 If you find a way, let us know! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
Hi Sam, Sounds like you needed osd 20. You can mark osd 20 lost. -Sam Does not work: # ceph osd lost 20 --yes-i-really-mean-it osd.20 is not down or doesn't exist Also, here is an interesting post which I will follow from October: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/044059.html Hello, all. I got some advice from the IRC channel (thanks bloodice!) that I temporarily reduce the min_size of my cluster (size = 2) from 2 down to 1. That immediately caused all of my incomplete PGs to start recovering and everything seemed to come back OK. I was serving out and RBD from here and xfs_repair reported no problems. So... happy ending? What started this all was that I was altering my CRUSH map causing significant rebalancing on my cluster which had size = 2. During this process I lost an OSD (osd.10) and eventually ended up with incomplete PGs. Knowing that I only lost 1 osd I was pretty sure that I hadn't lost any data I just couldn't get the PGs to recover without changing the min_size. It is good that this worked for him, but it also seems like a bug that it worked! (I.e. ceph should have been able to recover on its own without weird workarounds.) I'll let you know if this works for me! Thanks, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] create multiple OSDs without changing CRUSH until one last step
Hi Greg, Looks promising... I added [global] ... mon osd auto mark new in = false then pushed config to monitor ceph-deploy --overwrite-conf config push mon01 then restart monitor /etc/init.d/ceph restart mon then tried ceph-deploy --overwrite-conf disk prepare --zap-disk osd02:sde /dev/null but it still got added to the osd tree with up and weights which caused data redistribution. Did I miss something? Does ceph think this OSD is not new for some reason? (I have had OSDs with the same number before due to removes/adds...?) Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] fuse or kernel to mount rbd?
Not to 3.2. I would recommend running a more recent ubuntu kernel (which I *think* the support on 12.04 still) like 3.8 or 3.11. Those kernels should be pretty stable provided the ubuntu kernel guys are keeping up with the mainline stable kernels at kernel.org (they generally do). Thanks! How stable would a testing Debian kernel be? (3.13?) Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com