[ceph-users] Checking cephfs compression is working
How do you confirm that cephfs files and rados objects are being compressed? I don't see how in the docs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Recover files from cephfs data pool
Gotcha. Yah I think we are going continue the scanning to build a new metadata pool. I am making some progress on a script to extract files from the data store. Just need to find the exact format of the xattr's and the object hierarchy for large files. If I end up taking the script to the finish line this will be something I post for the community. So I am reading c source code at the moment to see what cephfs is doing. On Mon, Nov 5, 2018 at 8:10 PM Sergey Malinin wrote: > With cppool you got bunch of useless zero-sized objects because unlike > "export", cppool does not copy omap data which actually holds all the > inodes info. > I suggest truncating journals only for an effort of reducing downtime > followed by immediate backup of available files to a fresh fs. After > resetting journals the part of your fs covered by not flushed "UPDATE" > entries *will* become inconsistent. MDS may start to occasionally segfault > but it can be avoided by setting forced readonly mode (in this mode MDS > journal will not flush so you will need extra disk space). > If you want to get the original fs recovered and fully functional - you > need to somehow replay the journal (I'm unsure whether cephfs-data-scan > tool operates on journal entries). > > > > On 6.11.2018, at 03:43, Rhian Resnick wrote: > > Workload is mixed. > > We ran a rados cpool to backup the metadata pool. > > So your thinking that truncating journal and purge queue (we are luminous) > with a reset could bring us online missing just data from that day. (most > when the issue started) > > If so we could continue our scan into our recovery partition and give it a > try tomorrow after discussions with our recovery team. > > > > > On Mon, Nov 5, 2018 at 7:40 PM Sergey Malinin wrote: > >> What was your recent workload? There are chances not to lose much if it >> was mostly read ops. If such, you *must backup your metadata pool via >> "rados export" in order to preserve omap data*, then try truncating >> journals (along with purge queue if supported by your ceph version), wiping >> session table, and resetting the fs. >> >> >> On 6.11.2018, at 03:26, Rhian Resnick wrote: >> >> That was our original plan. So we migrated to bigger disks and have space >> but recover dentry uses up all our memory (128 GB) and crashes out. >> >> On Mon, Nov 5, 2018 at 7:23 PM Sergey Malinin wrote: >> >>> I had the same problem with multi-mds. I solved it by freeing up a >>> little space on OSDs, doing "recover dentries", truncating the journal, and >>> then "fs reset". After that I was able to revert to single-active MDS and >>> kept on running for a year until it failed on 13.2.2 upgrade :)) >>> >>> >>> On 6.11.2018, at 03:18, Rhian Resnick wrote: >>> >>> Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used >>> all space on OSD and now 2 ranks report damage. The recovery tools on the >>> journal fail as they run out of memory leaving us with the option of >>> truncating the journal and loosing data or recovering using the scan tools. >>> >>> Any ideas on solutions are welcome. I posted all the logs and and >>> cluster design previously but am happy to do so again. We are not desperate >>> but we are hurting with this long downtime. >>> >>> On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin wrote: >>> >>>> What kind of damage have you had? Maybe it is worth trying to get MDS >>>> to start and backup valuable data instead of doing long running recovery? >>>> >>>> >>>> On 6.11.2018, at 02:59, Rhian Resnick wrote: >>>> >>>> Sounds like I get to have some fun tonight. >>>> >>>> On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin >>> >>>>> inode linkage (i.e. folder hierarchy) and file names are stored in >>>>> omap data of objects in metadata pool. You can write a script that would >>>>> traverse through all the metadata pool to find out file names correspond >>>>> to >>>>> objects in data pool and fetch required files via 'rados get' command. >>>>> >>>>> > On 6.11.2018, at 02:26, Sergey Malinin wrote: >>>>> > >>>>> > Yes, 'rados -h'. >>>>> > >>>>> > >>>>> >> On 6.11.2018, at 02:25, Rhian Resnick wrote: >>>>> >> >>>>> >> Does a tool exist to recover files from a cephfs data partition? We >>>>> are rebuilding metadata but have a user who needs data asap. >>>>> >> ___ >>>>> >> ceph-users mailing list >>>>> >> ceph-users@lists.ceph.com >>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> > >>>>> >>>>> >>>> >>> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Recover files from cephfs data pool
Workload is mixed. We ran a rados cpool to backup the metadata pool. So your thinking that truncating journal and purge queue (we are luminous) with a reset could bring us online missing just data from that day. (most when the issue started) If so we could continue our scan into our recovery partition and give it a try tomorrow after discussions with our recovery team. On Mon, Nov 5, 2018 at 7:40 PM Sergey Malinin wrote: > What was your recent workload? There are chances not to lose much if it > was mostly read ops. If such, you *must backup your metadata pool via > "rados export" in order to preserve omap data*, then try truncating > journals (along with purge queue if supported by your ceph version), wiping > session table, and resetting the fs. > > > On 6.11.2018, at 03:26, Rhian Resnick wrote: > > That was our original plan. So we migrated to bigger disks and have space > but recover dentry uses up all our memory (128 GB) and crashes out. > > On Mon, Nov 5, 2018 at 7:23 PM Sergey Malinin wrote: > >> I had the same problem with multi-mds. I solved it by freeing up a little >> space on OSDs, doing "recover dentries", truncating the journal, and then >> "fs reset". After that I was able to revert to single-active MDS and kept >> on running for a year until it failed on 13.2.2 upgrade :)) >> >> >> On 6.11.2018, at 03:18, Rhian Resnick wrote: >> >> Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used >> all space on OSD and now 2 ranks report damage. The recovery tools on the >> journal fail as they run out of memory leaving us with the option of >> truncating the journal and loosing data or recovering using the scan tools. >> >> Any ideas on solutions are welcome. I posted all the logs and and cluster >> design previously but am happy to do so again. We are not desperate but we >> are hurting with this long downtime. >> >> On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin wrote: >> >>> What kind of damage have you had? Maybe it is worth trying to get MDS to >>> start and backup valuable data instead of doing long running recovery? >>> >>> >>> On 6.11.2018, at 02:59, Rhian Resnick wrote: >>> >>> Sounds like I get to have some fun tonight. >>> >>> On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin >> >>>> inode linkage (i.e. folder hierarchy) and file names are stored in omap >>>> data of objects in metadata pool. You can write a script that would >>>> traverse through all the metadata pool to find out file names correspond to >>>> objects in data pool and fetch required files via 'rados get' command. >>>> >>>> > On 6.11.2018, at 02:26, Sergey Malinin wrote: >>>> > >>>> > Yes, 'rados -h'. >>>> > >>>> > >>>> >> On 6.11.2018, at 02:25, Rhian Resnick wrote: >>>> >> >>>> >> Does a tool exist to recover files from a cephfs data partition? We >>>> are rebuilding metadata but have a user who needs data asap. >>>> >> ___ >>>> >> ceph-users mailing list >>>> >> ceph-users@lists.ceph.com >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> > >>>> >>>> >>> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Recover files from cephfs data pool
That was our original plan. So we migrated to bigger disks and have space but recover dentry uses up all our memory (128 GB) and crashes out. On Mon, Nov 5, 2018 at 7:23 PM Sergey Malinin wrote: > I had the same problem with multi-mds. I solved it by freeing up a little > space on OSDs, doing "recover dentries", truncating the journal, and then > "fs reset". After that I was able to revert to single-active MDS and kept > on running for a year until it failed on 13.2.2 upgrade :)) > > > On 6.11.2018, at 03:18, Rhian Resnick wrote: > > Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used > all space on OSD and now 2 ranks report damage. The recovery tools on the > journal fail as they run out of memory leaving us with the option of > truncating the journal and loosing data or recovering using the scan tools. > > Any ideas on solutions are welcome. I posted all the logs and and cluster > design previously but am happy to do so again. We are not desperate but we > are hurting with this long downtime. > > On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin wrote: > >> What kind of damage have you had? Maybe it is worth trying to get MDS to >> start and backup valuable data instead of doing long running recovery? >> >> >> On 6.11.2018, at 02:59, Rhian Resnick wrote: >> >> Sounds like I get to have some fun tonight. >> >> On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin > >>> inode linkage (i.e. folder hierarchy) and file names are stored in omap >>> data of objects in metadata pool. You can write a script that would >>> traverse through all the metadata pool to find out file names correspond to >>> objects in data pool and fetch required files via 'rados get' command. >>> >>> > On 6.11.2018, at 02:26, Sergey Malinin wrote: >>> > >>> > Yes, 'rados -h'. >>> > >>> > >>> >> On 6.11.2018, at 02:25, Rhian Resnick wrote: >>> >> >>> >> Does a tool exist to recover files from a cephfs data partition? We >>> are rebuilding metadata but have a user who needs data asap. >>> >> ___ >>> >> ceph-users mailing list >>> >> ceph-users@lists.ceph.com >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> >>> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Recover files from cephfs data pool
Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used all space on OSD and now 2 ranks report damage. The recovery tools on the journal fail as they run out of memory leaving us with the option of truncating the journal and loosing data or recovering using the scan tools. Any ideas on solutions are welcome. I posted all the logs and and cluster design previously but am happy to do so again. We are not desperate but we are hurting with this long downtime. On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin wrote: > What kind of damage have you had? Maybe it is worth trying to get MDS to > start and backup valuable data instead of doing long running recovery? > > > On 6.11.2018, at 02:59, Rhian Resnick wrote: > > Sounds like I get to have some fun tonight. > > On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin >> inode linkage (i.e. folder hierarchy) and file names are stored in omap >> data of objects in metadata pool. You can write a script that would >> traverse through all the metadata pool to find out file names correspond to >> objects in data pool and fetch required files via 'rados get' command. >> >> > On 6.11.2018, at 02:26, Sergey Malinin wrote: >> > >> > Yes, 'rados -h'. >> > >> > >> >> On 6.11.2018, at 02:25, Rhian Resnick wrote: >> >> >> >> Does a tool exist to recover files from a cephfs data partition? We >> are rebuilding metadata but have a user who needs data asap. >> >> ___ >> >> ceph-users mailing list >> >> ceph-users@lists.ceph.com >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Recover files from cephfs data pool
Does a tool exist to recover files from a cephfs data partition? We are rebuilding metadata but have a user who needs data asap. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] speeding up ceph
What type of bandwidth did you see during the recovery process? We are seeing around 2 Mbps on each box running 20 processes each. On Mon, Nov 5, 2018 at 11:31 AM Sergey Malinin wrote: > Although I was advised not to use caching during recovery, I didn't notice > any improvements after disabling it. > > > > On 5.11.2018, at 17:32, Rhian Resnick wrote: > > > > We are running cephfs-data-scan to rebuild metadata. Would changing the > cache tier mode of our cephfs data partition improve performance? If so > what should we switch to? > > > > Thanks > > > > Rhian > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] speeding up ceph
We are running cephfs-data-scan to rebuild metadata. Would changing the cache tier mode of our cephfs data partition improve performance? If so what should we switch to? Thanks Rhian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs-data-scan
Sounds like we are going to restart with 20 threads on each storage node. On Sat, Nov 3, 2018 at 8:26 PM Sergey Malinin wrote: > scan_extents using 8 threads took 82 hours for my cluster holding 120M > files on 12 OSDs with 1gbps between nodes. I would have gone with lot more > threads if I had known it only operated on data pool and the only problem > was network latency. If I recall correctly, each worker used up to 800mb > ram so beware the OOM killer. > scan_inodes runs several times faster but I don’t remember exact timing. > In your case I believe scan_extents & scan_inodes can be done in a few > hours by running the tool on each OSD node, but scan_links will be > painfully slow due to it’s single-threaded nature. > In my case I ended up getting MDS to start and copied all data to a fresh > filesystem ignoring errors. > On Nov 4, 2018, 02:22 +0300, Rhian Resnick , wrote: > > For a 150TB file system with 40 Million files how many cephfs-data-scan > threads should be used? Or what is the expected run time. (we have 160 osd > with 4TB disks.) > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs-data-scan
For a 150TB file system with 40 Million files how many cephfs-data-scan threads should be used? Or what is the expected run time. (we have 160 osd with 4TB disks.) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Snapshot cephfs data pool from ceph cmd
is it possible to snapshot the cephfs data pool? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs-journal-tool event recover_dentries summary killed due to memory usage
Having attempted to recover using the journal tool and having that fail we are goinig to rebuild our metadata using a separate metadata pool. We have the following procedure we are going to use. The issue I haven't found yet (likely lack of sleep) is how to replace the original metadata pool in the cephfs so we can continue to use the default name. Then how we remove the secondary file system. # ceph fs ceph fs flag set enable_multiple true --yes-i-really-mean-it ceph osd pool create recovery 512 replicated replicated_ruleset ceph fs new recovery-fs recovery cephfs-cold --allow-dangerous-metadata-overlay cephfs-data-scan init --force-init --filesystem recovery-fs --alternate-pool recovery ceph fs reset recovery-fs --yes-i-really-mean-it # create structure cephfs-table-tool recovery-fs:all reset session cephfs-table-tool recovery-fs:all reset snap cephfs-table-tool recovery-fs:all reset inode # build new metadata # scan_extents cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 0 --worker_m 4 --filesystem cephfs cephfs-cold cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 1 --worker_m 4 --filesystem cephfs cephfs-cold cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 2 --worker_m 4 --filesystem cephfs cephfs-cold cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 3 --worker_m 4 --filesystem cephfs cephfs-cold # scan inodes cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold cephfs-data-scan scan_links --filesystem recovery-fs # need help Thanks Rhian On Fri, Nov 2, 2018 at 9:47 PM Rhian Resnick wrote: > I was posting with my office account but I think it is being blocked. > > Our cephfs's metadata pool went from 1GB to 1TB in a matter of hours and > after using all storage on the OSD's reports two damaged ranks. > > The cephfs-journal-tool crashes when performing any operations due to > memory utilization. > > We tried a backup which crashed (we then did a rados cppool to backup our > metadata). > I then tried to run a dentry recovery which failed due to memory usage. > > Any recommendations for the next step? > > Data from our config and status > > > > > Combined logs (after marking things as repaired to see if that would rescue > us): > > > Nov 1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 > -1 mds.4.purge_queue operator(): Error -108 loading Journaler > Nov 1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 > -1 mds.4.purge_queue operator(): Error -108 loading Journaler > Nov 1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 > -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged > (MDS_DAMAGE) > Nov 1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 > -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged > (MDS_DAMAGE) > Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 > 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from > _is_readable > Nov 1 10:26:47 ceph-storage2 ceph-mds: mds.1 10.141.255.202:6898/1492854021 > 1 : Error loading MDS rank 1: (22) Invalid argument > Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914949 > 7f6dacd69700 0 mds.1.log _replay journaler got error -22, aborting > Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 > 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from > _is_readable > Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 > 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: > (22) Invalid argument > Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 > 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: > (22) Invalid argument > Nov 1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 > -1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons > damaged (MDS_DAMAGE) > Nov 1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 > -1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons > damaged (MDS_DAMAGE) > Nov 1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231 7fa3b57ce700 > -1 log_channel(clust
Re: [ceph-users] cephfs-journal-tool event recover_dentries summary killed due to memory usage
Morning, Having attempted to recover using the journal tool and having that fail we are goinig to rebuild our metadata using a separate metadata pool. We have the following procedure we are going to use. The issue I haven't found yet (likely lack of sleep) is how to replace the original metadata pool in the cephfs so we can continue to use the default name. Then how we remove the secondary file system. # ceph fs ceph fs flag set enable_multiple true --yes-i-really-mean-it ceph osd pool create recovery 512 replicated replicated_ruleset ceph fs new recovery-fs recovery cephfs-cold --allow-dangerous-metadata-overlay cephfs-data-scan init --force-init --filesystem recovery-fs --alternate-pool recovery ceph fs reset recovery-fs --yes-i-really-mean-it # create structure cephfs-table-tool recovery-fs:all reset session cephfs-table-tool recovery-fs:all reset snap cephfs-table-tool recovery-fs:all reset inode # build new metadata # scan_extents cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 0 --worker_m 4 --filesystem cephfs cephfs-cold cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 1 --worker_m 4 --filesystem cephfs cephfs-cold cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 2 --worker_m 4 --filesystem cephfs cephfs-cold cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 3 --worker_m 4 --filesystem cephfs cephfs-cold # scan inodes cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold cephfs-data-scan scan_links --filesystem recovery-fs # need help how to move the new metadata pool to the original filesystem? how to remove the new cephfs so the original mounts work. Rhian Resnick Associate Director Research Computing Enterprise Systems Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: ceph-users on behalf of Rhian Resnick Sent: Friday, November 2, 2018 9:47 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] cephfs-journal-tool event recover_dentries summary killed due to memory usage I was posting with my office account but I think it is being blocked. Our cephfs's metadata pool went from 1GB to 1TB in a matter of hours and after using all storage on the OSD's reports two damaged ranks. The cephfs-journal-tool crashes when performing any operations due to memory utilization. We tried a backup which crashed (we then did a rados cppool to backup our metadata). I then tried to run a dentry recovery which failed due to memory usage. Any recommendations for the next step? Data from our config and status Combined logs (after marking things as repaired to see if that would rescue us): Nov 1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 -1 mds.4.purge_queue operator(): Error -108 loading Journaler Nov 1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 -1 mds.4.purge_queue operator(): Error -108 loading Journaler Nov 1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged (MDS_DAMAGE) Nov 1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged (MDS_DAMAGE) Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable Nov 1 10:26:47 ceph-storage2 ceph-mds: mds.1 10.141.255.202:6898/1492854021<http://10.141.255.202:6898/1492854021> 1 : Error loading MDS rank 1: (22) Invalid argument Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914949 7f6dacd69700 0 mds.1.log _replay journaler got error -22, aborting Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid argument Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid argument Nov 1 10:26:47 ceph-p-mon2 ceph-
[ceph-users] cephfs-journal-tool event recover_dentries summary killed due to memory usage
I was posting with my office account but I think it is being blocked. Our cephfs's metadata pool went from 1GB to 1TB in a matter of hours and after using all storage on the OSD's reports two damaged ranks. The cephfs-journal-tool crashes when performing any operations due to memory utilization. We tried a backup which crashed (we then did a rados cppool to backup our metadata). I then tried to run a dentry recovery which failed due to memory usage. Any recommendations for the next step? Data from our config and status Combined logs (after marking things as repaired to see if that would rescue us): Nov 1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 -1 mds.4.purge_queue operator(): Error -108 loading Journaler Nov 1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 -1 mds.4.purge_queue operator(): Error -108 loading Journaler Nov 1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged (MDS_DAMAGE) Nov 1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged (MDS_DAMAGE) Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable Nov 1 10:26:47 ceph-storage2 ceph-mds: mds.1 10.141.255.202:6898/1492854021 1 : Error loading MDS rank 1: (22) Invalid argument Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914949 7f6dacd69700 0 mds.1.log _replay journaler got error -22, aborting Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid argument Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid argument Nov 1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons damaged (MDS_DAMAGE) Nov 1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons damaged (MDS_DAMAGE) Nov 1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged (MDS_DAMAGE) Nov 1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged (MDS_DAMAGE) Ceph OSD Status: (The missing and oud osd's are in a different pool from all data, these were the bad ssds that caused the issue) cluster: id: 6a2e8f21-bca2-492b-8869-eecc995216cc health: HEALTH_ERR 1 filesystem is degraded 2 mds daemons damaged services: mon: 3 daemons, quorum ceph-p-mon2,ceph-p-mon1,ceph-p-mon3 mgr: ceph-p-mon1(active), standbys: ceph-p-mon2 mds: cephfs-3/5/5 up {0=ceph-storage3=up:resolve,2=ceph-p-mon3=up:resolve,4=ceph-p-mds1=up:resolve}, 3 up:standby, 2 damaged osd: 170 osds: 167 up, 158 in data: pools: 7 pools, 7520 pgs objects: 188.46M objects, 161TiB usage: 275TiB used, 283TiB / 558TiB avail pgs: 7511 active+clean 9active+clean+scrubbing+deep io: client: 0B/s rd, 17.2KiB/s wr, 0op/s rd, 1op/s wr Ceph OSD Tree: ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF -10 0 root deefault -9 5.53958 root ssds -11 1.89296 host ceph-cache1 35 hdd 1.09109 osd.35 up0 1.0 181 hdd 0.26729 osd.181up0 1.0 182 hdd 0.26729 osd.182 down0 1.0 183 hdd 0.26729 osd.183 down0 1.0 -12 1.75366 host ceph-cache2 46 hdd 1.09109 osd.46 up0 1.0 185 hdd 0.26729 osd.185 down0 1.0 186 hdd 0.12799 osd.186up0 1.0 187 hdd 0.26729 osd.187up0 1.0 -13 1.89296 host ceph-cache3 60 hdd 1.09109 osd.60 up0 1.0 189 hdd 0.26729 osd.189up0 1.0 190 hdd 0.26729 osd.190up0 1.0 191 hdd 0.26729 osd.191up0 1.0 -5 4.33493 root ssds-ro -6 1.44498 host ceph-storage1-ssd 85 ssd 0.72249 osd.85 up 1.0 1.0 89 ssd 0.72249 osd.89 up
[ceph-users] Damaged MDS Ranks will not start / recover
up 1.0 1.0 77 hdd 3.63199 osd.77 up 1.0 1.0 82 hdd 3.63199 osd.82 up 1.0 1.0 86 hdd 3.63199 osd.86 up 1.0 1.0 88 hdd 3.63199 osd.88 up 1.0 1.0 95 hdd 3.63199 osd.95 up 1.0 1.0 103 hdd 3.63199 osd.103up 1.0 1.0 109 hdd 3.63199 osd.109up 1.0 1.0 113 hdd 3.63199 osd.113up 1.0 1.0 120 hdd 3.63199 osd.120up 1.0 1.0 127 hdd 3.63199 osd.127up 1.0 1.0 134 hdd 3.63199 osd.134up 1.0 1.0 140 hdd 3.63869 osd.140up 1.0 1.0 141 hdd 3.63199 osd.141up 1.0 1.0 143 hdd 3.63199 osd.143up 1.0 1.0 144 hdd 3.63199 osd.144up 1.0 1.0 145 hdd 3.63199 osd.145up 1.0 1.0 146 hdd 3.63199 osd.146up 1.0 1.0 147 hdd 3.63199 osd.147up 1.0 1.0 148 hdd 3.63199 osd.148up 1.0 1.0 149 hdd 3.63199 osd.149up 1.0 1.0 150 hdd 3.63199 osd.150up 1.0 1.0 151 hdd 3.63199 osd.151up 1.0 1.0 152 hdd 3.63199 osd.152up 1.0 1.0 153 hdd 3.63199 osd.153up 1.0 1.0 154 hdd 3.63199 osd.154up 1.0 1.0 155 hdd 3.63199 osd.155up 1.0 1.0 156 hdd 3.63199 osd.156up 1.0 1.0 157 hdd 3.63199 osd.157up 1.0 1.0 158 hdd 3.63199 osd.158up 1.0 1.0 159 hdd 3.63199 osd.159up 1.0 1.0 161 hdd 3.63199 osd.161up 1.0 1.0 162 hdd 3.63199 osd.162up 1.0 1.0 164 hdd 3.63199 osd.164up 1.0 1.0 165 hdd 3.63199 osd.165up 1.0 1.0 167 hdd 3.63199 osd.167up 1.0 1.0 168 hdd 3.63199 osd.168up 1.0 1.0 169 hdd 3.63199 osd.169up 1.0 1.0 170 hdd 3.63199 osd.170up 1.0 1.0 171 hdd 3.63199 osd.171up 1.0 1.0 172 hdd 3.63199 osd.172up 1.0 1.0 173 hdd 3.63199 osd.173up 1.0 1.0 174 hdd 3.63869 osd.174up 1.0 1.0 177 hdd 3.63199 osd.177up 1.0 1.0 # Ceph configuration shared by all nodes [global] fsid = 6a2e8f21-bca2-492b-8869-eecc995216cc public_network = 10.141.0.0/16 cluster_network = 10.85.8.0/22 mon_initial_members = ceph-p-mon1, ceph-p-mon2, ceph-p-mon3 mon_host = 10.141.161.248,10.141.160.250,10.141.167.237 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx # Cephfs needs these to be set to support larger directories mds_bal_frag = true allow_dirfrags = true rbd_default_format = 2 mds_beacon_grace = 60 mds session timeout = 120 log to syslog = true err to syslog = true clog to syslog = true [mds] [osd] osd op threads = 32 osd max backfills = 32 # Old method of moving ssds to a pool [osd.85] host = ceph-storage1 crush_location = root=ssds host=ceph-storage1-ssd [osd.89] host = ceph-storage1 crush_location = root=ssds host=ceph-storage1-ssd [osd.160] host = ceph-storage3 crush_location = root=ssds host=ceph-storage3-ssd [osd.163] host = ceph-storage3 crush_location = root=ssds host=ceph-storage3-ssd [osd.166] host = ceph-storage3 crush_location = root=ssds host=ceph-storage3-ssd [osd.5] host = ceph-storage2 crush_location = root=ssds host=ceph-storage2-ssd [osd.68] host = ceph-storage2 crush_location = root=ssds host=ceph-storage2-ssd [osd.87] host = ceph-storage2 crush_location = root=ssds host=ceph-storage2-ssd Rhian Resnick Associate Director Research Computing Enterprise Systems Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Removing MDS
Morning our backup of the metadata is 75% done (rados cppool as the metadata export fails by using up all server memory). Before we start working on fixing our metadata we wanted our projected procedure to be reviewed. Does the following sequence look correct for our environment? 1. rados cppool cephfs_metadata cephfs_metadata.bk 2. cephfs-journal-tool event recover_dentries summary --rank=0 3. cephfs-journal-tool event recover_dentries summary --rank=1 4. cephfs-journal-tool event recover_dentries summary --rank=2 5. cephfs-journal-tool event recover_dentries summary --rank=3 6. cephfs-journal-tool event recover_dentries summary --rank=4 7. cephfs-journal-tool journal reset --rank=0 8. cephfs-journal-tool journal reset --rank=1 9. cephfs-journal-tool journal reset --rank=2 10. cephfs-journal-tool journal reset --rank=3 11. cephfs-journal-tool journal reset --rank=4 12. cephfs-table-tool all reset session 13. Start metadata servers 14. Scrub mds: * ceph daemon mds.{hostname} scrub_path / recursive * ceph daemon mds.{hostname} scrub_path / force 15. Rhian Resnick Associate Director Research Computing Enterprise Systems Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: Rhian Resnick Sent: Thursday, November 1, 2018 10:32 AM To: Patrick Donnelly Cc: Ceph Users Subject: Re: [ceph-users] Removing MDS Morning all, This has been a rough couple days. We thought we had resolved all our performance issues by moving the ceph metadata to some high intensity write disks from Intel but what we didn't notice was that Ceph labeled them as HDD's (thanks dell raid controller). We believe this caused read lock errors and resulted in the journal increasing from 700MB to 1 TB in 2 hours. (Basically over lunch) We tried to migrate and then stop everything before the OSD's reached full status but failed. Over the last 12 hours the data has been migrated from the SDD's back to spinning disks but the MDS servers are now reporting that two ranks are damaged. We are running a backup of the metadata pool but wanted to know what the list thinks the next steps should be. I have attached the error's we see in the logs as well as our OSD Tree, ceph.conf (comments removed), and ceph fs dump. Combined logs (after marking things as repaired to see if that would rescue us): Nov 1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 -1 mds.4.purge_queue operator(): Error -108 loading Journaler Nov 1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 -1 mds.4.purge_queue operator(): Error -108 loading Journaler Nov 1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged (MDS_DAMAGE) Nov 1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged (MDS_DAMAGE) Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable Nov 1 10:26:47 ceph-storage2 ceph-mds: mds.1 10.141.255.202:6898/1492854021 1 : Error loading MDS rank 1: (22) Invalid argument Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914949 7f6dacd69700 0 mds.1.log _replay journaler got error -22, aborting Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid argument Nov 1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid argument Nov 1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons damaged (MDS_DAMAGE) Nov 1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons damaged (MDS_DAMAGE) Nov 1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged (MDS_DAMAGE) Nov 1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231 7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged (MDS_DAMAGE) Ceph OSD Status: (The missing and oud osd's are in a different pool from all data, these were the bad ssds that caused the issue) cluster: id: 6a2e8f21-bca2-492
Re: [ceph-users] Removing MDS
up 1.0 1.0 149 hdd 3.63199 osd.149up 1.0 1.0 150 hdd 3.63199 osd.150up 1.0 1.0 151 hdd 3.63199 osd.151up 1.0 1.0 152 hdd 3.63199 osd.152up 1.0 1.0 153 hdd 3.63199 osd.153up 1.0 1.0 154 hdd 3.63199 osd.154up 1.0 1.0 155 hdd 3.63199 osd.155up 1.0 1.0 156 hdd 3.63199 osd.156up 1.0 1.0 157 hdd 3.63199 osd.157up 1.0 1.0 158 hdd 3.63199 osd.158up 1.0 1.0 159 hdd 3.63199 osd.159up 1.0 1.0 161 hdd 3.63199 osd.161up 1.0 1.0 162 hdd 3.63199 osd.162up 1.0 1.0 164 hdd 3.63199 osd.164up 1.0 1.0 165 hdd 3.63199 osd.165up 1.0 1.0 167 hdd 3.63199 osd.167up 1.0 1.0 168 hdd 3.63199 osd.168up 1.0 1.0 169 hdd 3.63199 osd.169up 1.0 1.0 170 hdd 3.63199 osd.170up 1.0 1.0 171 hdd 3.63199 osd.171up 1.0 1.0 172 hdd 3.63199 osd.172up 1.0 1.0 173 hdd 3.63199 osd.173up 1.0 1.0 174 hdd 3.63869 osd.174up 1.0 1.0 177 hdd 3.63199 osd.177up 1.0 1.0 # Ceph configuration shared by all nodes [global] fsid = 6a2e8f21-bca2-492b-8869-eecc995216cc public_network = 10.141.0.0/16 cluster_network = 10.85.8.0/22 mon_initial_members = ceph-p-mon1, ceph-p-mon2, ceph-p-mon3 mon_host = 10.141.161.248,10.141.160.250,10.141.167.237 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx # Cephfs needs these to be set to support larger directories mds_bal_frag = true allow_dirfrags = true rbd_default_format = 2 mds_beacon_grace = 60 mds session timeout = 120 log to syslog = true err to syslog = true clog to syslog = true [mds] [osd] osd op threads = 32 osd max backfills = 32 # Old method of moving ssds to a pool [osd.85] host = ceph-storage1 crush_location = root=ssds host=ceph-storage1-ssd [osd.89] host = ceph-storage1 crush_location = root=ssds host=ceph-storage1-ssd [osd.160] host = ceph-storage3 crush_location = root=ssds host=ceph-storage3-ssd [osd.163] host = ceph-storage3 crush_location = root=ssds host=ceph-storage3-ssd [osd.166] host = ceph-storage3 crush_location = root=ssds host=ceph-storage3-ssd [osd.5] host = ceph-storage2 crush_location = root=ssds host=ceph-storage2-ssd [osd.68] host = ceph-storage2 crush_location = root=ssds host=ceph-storage2-ssd [osd.87] host = ceph-storage2 crush_location = root=ssds host=ceph-storage2-ssd Rhian Resnick Associate Director Research Computing Enterprise Systems Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: Patrick Donnelly Sent: Tuesday, October 30, 2018 8:40 PM To: Rhian Resnick Cc: Ceph Users Subject: Re: [ceph-users] Removing MDS On Tue, Oct 30, 2018 at 4:05 PM Rhian Resnick wrote: > We are running into issues deactivating mds ranks. Is there a way to safely > forcibly remove a rank? No, there's no "safe" way to force the issue. The rank needs to come back, flush its journal, and then complete its deactivation. To get more help, you need to describe your environment, version of Ceph in use, relevant log snippets, etc. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Removing MDS
That is what I though. I am increasing debug to see where we are getting stuck. I am not sure if it is an issue deactivating or a rdlock issue. Thanks if we discover more we will post a question with details. Rhian Resnick Associate Director Research Computing Enterprise Systems Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: Patrick Donnelly Sent: Tuesday, October 30, 2018 8:40 PM To: Rhian Resnick Cc: Ceph Users Subject: Re: [ceph-users] Removing MDS On Tue, Oct 30, 2018 at 4:05 PM Rhian Resnick wrote: > We are running into issues deactivating mds ranks. Is there a way to safely > forcibly remove a rank? No, there's no "safe" way to force the issue. The rank needs to come back, flush its journal, and then complete its deactivation. To get more help, you need to describe your environment, version of Ceph in use, relevant log snippets, etc. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Removing MDS
Evening, We are running into issues deactivating mds ranks. Is there a way to safely forcibly remove a rank? Rhian Resnick Associate Director Research Computing Enterprise Systems Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reducing Max_mds
John, Thanks! Rhian Resnick Associate Director Research Computing Enterprise Systems Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: John Spray Sent: Tuesday, October 30, 2018 5:26 AM To: Rhian Resnick Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Reducing Max_mds On Tue, Oct 30, 2018 at 6:36 AM Rhian Resnick wrote: > > Evening, > > > I am looking to decrease our max mds servers as we had a server failure and > need to remove a node. > > > When we attempt to decrease the number of mds servers from 5 to 4 (or any > other number) they never transition to standby. They just stay active. > > > ceph fs set cephfs max_mds X After you decrease max_mds, use "ceph mds deactivate " to bring the actual number of active daemons in line with your new intended maximum. >From Ceph 13.x that happens automatically, but since you're on 12.x it needs doing by hand. John > > Nothing looks useful in the mds or mon logs and I was wondering what you > recommend looking at? > > > We are on 12.2.9 running Centos. > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Reducing Max_mds
Evening, I am looking to decrease our max mds servers as we had a server failure and need to remove a node. When we attempt to decrease the number of mds servers from 5 to 4 (or any other number) they never transition to standby. They just stay active. ceph fs set cephfs max_mds X Nothing looks useful in the mds or mon logs and I was wondering what you recommend looking at? We are on 12.2.9 running Centos. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Error Creating OSD
Afternoon, Happily, I resolved this issue. Running vgdisplay showed that ceph-volume tried to create a disk on failed disk. (We didn't know we had a bad did so this is information that was new to us) and when the command failed it left three bad volume groups. Since you cannot rename them you need to use the following command to delete them. vgdisplay to find the bad volume groups vgremove --select vg_uuid=your uuid -f # -f forces it to be removed Rhian Resnick Associate Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: Rhian Resnick Sent: Saturday, April 14, 2018 12:47 PM To: Alfredo Deza Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Error Creating OSD Thanks all, Here is a link to our our command being executed: https://pastebin.com/iy8iSaKH Here are the results from the command Executed with debug enabled (after a zap with destroy) [root@ceph-storage3 ~]# ceph-volume lvm create --bluestore --data /dev/sdu Running command: ceph-authtool --gen-print-key Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 664894a8-530a-4557-b2f4-1af5b391f2b7 --> Was unable to complete a new OSD, will rollback changes --> OSD will be fully purged from the cluster, because the ID was generated Running command: ceph osd purge osd.140 --yes-i-really-mean-it stderr: purged osd.140 Traceback (most recent call last): File "/sbin/ceph-volume", line 6, in main.Volume() File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 37, in __init__ self.main(self.argv) File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 153, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 38, in main terminal.dispatch(self.mapper, self.argv) File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", line 74, in main self.create(args) File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", line 26, in create prepare_step.safe_prepare(args) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", line 217, in safe_prepare self.prepare(args) File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", line 283, in prepare block_lv = self.prepare_device(args.data, 'block', cluster_fsid, osd_fsid) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", line 193, in prepare_device if api.get_vg(vg_name=vg_name): File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 334, in get_vg return vgs.get(vg_name=vg_name, vg_tags=vg_tags) File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 429, in get raise MultipleVGsError(vg_name) ceph_volume.exceptions.MultipleVGsError: Got more than 1 result looking for volume group: ceph-6a2e8f21-bca2-492b-8869-eecc995216cc Rhian Resnick Associate Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: Alfredo Deza <ad...@redhat.com> Sent: Saturday, April 14, 2018 8:45 AM To: Rhian Resnick Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Error Creating OSD On Fri, Apr 13, 2018 at 8:20 PM, Rhian Resnick <rresn...@fau.edu<mailto:rresn...@fau.edu>> wrote: Evening, When attempting to create an OSD we receive the following error. [ceph-admin@ceph-storage3 ~]$ sudo ceph-volume lvm create --bluestore --data /dev/sdu Running command: ceph-authtool --gen-print-key Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new c8cb8cff-dad9-48b8-8d77-6f130a4b629d --> Was unable to complete a new OSD, will rollback changes
Re: [ceph-users] Error Creating OSD
Thanks all, Here is a link to our our command being executed: https://pastebin.com/iy8iSaKH Here are the results from the command Executed with debug enabled (after a zap with destroy) [root@ceph-storage3 ~]# ceph-volume lvm create --bluestore --data /dev/sdu Running command: ceph-authtool --gen-print-key Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 664894a8-530a-4557-b2f4-1af5b391f2b7 --> Was unable to complete a new OSD, will rollback changes --> OSD will be fully purged from the cluster, because the ID was generated Running command: ceph osd purge osd.140 --yes-i-really-mean-it stderr: purged osd.140 Traceback (most recent call last): File "/sbin/ceph-volume", line 6, in main.Volume() File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 37, in __init__ self.main(self.argv) File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 153, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 38, in main terminal.dispatch(self.mapper, self.argv) File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", line 74, in main self.create(args) File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", line 26, in create prepare_step.safe_prepare(args) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", line 217, in safe_prepare self.prepare(args) File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", line 283, in prepare block_lv = self.prepare_device(args.data, 'block', cluster_fsid, osd_fsid) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", line 193, in prepare_device if api.get_vg(vg_name=vg_name): File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 334, in get_vg return vgs.get(vg_name=vg_name, vg_tags=vg_tags) File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 429, in get raise MultipleVGsError(vg_name) ceph_volume.exceptions.MultipleVGsError: Got more than 1 result looking for volume group: ceph-6a2e8f21-bca2-492b-8869-eecc995216cc Rhian Resnick Associate Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: Alfredo Deza <ad...@redhat.com> Sent: Saturday, April 14, 2018 8:45 AM To: Rhian Resnick Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Error Creating OSD On Fri, Apr 13, 2018 at 8:20 PM, Rhian Resnick <rresn...@fau.edu<mailto:rresn...@fau.edu>> wrote: Evening, When attempting to create an OSD we receive the following error. [ceph-admin@ceph-storage3 ~]$ sudo ceph-volume lvm create --bluestore --data /dev/sdu Running command: ceph-authtool --gen-print-key Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new c8cb8cff-dad9-48b8-8d77-6f130a4b629d --> Was unable to complete a new OSD, will rollback changes --> OSD will be fully purged from the cluster, because the ID was generated Running command: ceph osd purge osd.140 --yes-i-really-mean-it stderr: purged osd.140 --> MultipleVGsError: Got more than 1 result looking for volume group: ceph-6a2e8f21-bca2-492b-8869-eecc995216cc Any hints on what to do? This occurs when we attempt to create osd's on this node. Can you use a paste site and get the /var/log/ceph/ceph-volume.log contents? Also, if you could try the same command but with: CEPH_VOLUME_DEBUG=1 I think you are hitting two issues here: 1) Somehow `osd new` is not completing and failing 2) The `purge` command to wipe out the LV is getting multiple LV's and cannot make sure to match the one it used. #2 definitely looks like something we are doing wrong, and #1 can have a lot of different causes. The logs would be tremendously helpful! Rhian Resnick Associate Director Middleware and HPC Office of Inf
[ceph-users] Error Creating OSD
Evening, When attempting to create an OSD we receive the following error. [ceph-admin@ceph-storage3 ~]$ sudo ceph-volume lvm create --bluestore --data /dev/sdu Running command: ceph-authtool --gen-print-key Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new c8cb8cff-dad9-48b8-8d77-6f130a4b629d --> Was unable to complete a new OSD, will rollback changes --> OSD will be fully purged from the cluster, because the ID was generated Running command: ceph osd purge osd.140 --yes-i-really-mean-it stderr: purged osd.140 --> MultipleVGsError: Got more than 1 result looking for volume group: ceph-6a2e8f21-bca2-492b-8869-eecc995216cc Any hints on what to do? This occurs when we attempt to create osd's on this node. Rhian Resnick Associate Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs increase max file size
Morning, We ran into an issue with the default max file size of a cephfs file. Is it possible to increase this value to 20 TB from 1 TB without recreating the file system? Rhian Resnick Assistant Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Inconsistent pgs with size_mismatch_oi
I didn't see any guidance on how to resolve the check some error online. Any hints? Rhian Resnick Assistant Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: Gregory Farnum <gfar...@redhat.com> Sent: Monday, July 3, 2017 11:49 AM To: Rhian Resnick Cc: ceph-users Subject: Re: [ceph-users] Inconsistent pgs with size_mismatch_oi On Mon, Jul 3, 2017 at 6:02 AM, Rhian Resnick <rresn...@fau.edu> wrote: > > Sorry to bring up an old post but on Kraken I am unable to repair a PG that > is inconsistent in a cache tier . We remove the bad object but am still > seeing the following error in the OSD's logs. It's possible, but the digest error means they checksum differently, rather than having different sizes (and the size check precedes the digest one). The part where all three of them are exactly the same is interesting and actually makes me suspect that something just went wrong in calculating the checksum... > > > > Prior to removing invalid object: > > /var/log/ceph/ceph-osd.126.log:928:2017-07-03 08:07:55.331479 7f95a73eb700 -1 > log_channel(cluster) log [ERR] : 1.15f shard 63: soid > 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest > 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 > client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 > alloc_hint [0 0 0]) > /var/log/ceph/ceph-osd.126.log:929:2017-07-03 08:07:55.331483 7f95a73eb700 -1 > log_channel(cluster) log [ERR] : 1.15f shard 126: soid > 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest > 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 > client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 > alloc_hint [0 0 0]) > /var/log/ceph/ceph-osd.126.log:930:2017-07-03 08:07:55.331487 7f95a73eb700 -1 > log_channel(cluster) log [ERR] : 1.15f shard 143: soid > 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest > 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 > client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 > alloc_hint [0 0 0]) > /var/log/ceph/ceph-osd.126.log:931:2017-07-03 08:07:55.331491 7f95a73eb700 -1 > log_channel(cluster) log [ERR] : 1.15f soid > 1:fa86fe35:::10006cdc2c5.:head: failed to pick suitable auth object > /var/log/ceph/ceph-osd.126.log:932:2017-07-03 08:08:27.605139 7f95a4be6700 -1 > log_channel(cluster) log [ERR] : 1.15f repair 3 errors, 0 fixed > > > > Post Removing invalid object: > /var/log/ceph/ceph-osd.126.log:3433:2017-07-03 08:37:03.780584 7f95a73eb700 > -1 log_channel(cluster) log [ERR] : 1.15f shard 63: soid > 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest > 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 > client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 > alloc_hint [0 0 0]) > /var/log/ceph/ceph-osd.126.log:3434:2017-07-03 08:37:03.780591 7f95a73eb700 > -1 log_channel(cluster) log [ERR] : 1.15f shard 126: soid > 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest > 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 > client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 > alloc_hint [0 0 0]) > /var/log/ceph/ceph-osd.126.log:3435:2017-07-03 08:37:03.780593 7f95a73eb700 > -1 log_channel(cluster) log [ERR] : 1.15f shard 143 missing > 1:fa86fe35:::10006cdc2c5.:head > /var/log/ceph/ceph-osd.126.log:3436:2017-07-03 08:37:03.780594 7f95a73eb700 > -1 log_channel(cluster) log [ERR] : 1.15f soid > 1:fa86fe35:::10006cdc2c5.:head: failed to pick suitable auth object > /var/log/ceph/ceph-osd.126.log:3437:2017-07-03 08:37:39.278991 7f95a4be6700 > -1 log_channel(cluster) log [ERR] : 1.15f repair 3 errors, 0 fixed > > > > Is it possible this thread is related to the error we are seeing? > > > Rhian Resnick > > Assistant Director Middleware and HPC > > Office of Information Technology > > > Florida Atlantic University > > 777 Glades Road, CM22, Rm 173B > > Boca Raton, FL 33431 > > Phone 561.297.2647 > > Fax 561.297.0222 > > > > > > > From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Gregory > Farnum <gfar...@redhat.com> > Sent: Monday, May 15, 2017 6:28 PM > To: Lincoln Bryant; Weil, Sage > Cc: ceph-users > Subjec
Re: [ceph-users] Inconsistent pgs with size_mismatch_oi
Sorry to bring up an old post but on Kraken I am unable to repair a PG that is inconsistent in a cache tier . We remove the bad object but am still seeing the following error in the OSD's logs. Prior to removing invalid object: /var/log/ceph/ceph-osd.126.log:928:2017-07-03 08:07:55.331479 7f95a73eb700 -1 log_channel(cluster) log [ERR] : 1.15f shard 63: soid 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 alloc_hint [0 0 0]) /var/log/ceph/ceph-osd.126.log:929:2017-07-03 08:07:55.331483 7f95a73eb700 -1 log_channel(cluster) log [ERR] : 1.15f shard 126: soid 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 alloc_hint [0 0 0]) /var/log/ceph/ceph-osd.126.log:930:2017-07-03 08:07:55.331487 7f95a73eb700 -1 log_channel(cluster) log [ERR] : 1.15f shard 143: soid 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 alloc_hint [0 0 0]) /var/log/ceph/ceph-osd.126.log:931:2017-07-03 08:07:55.331491 7f95a73eb700 -1 log_channel(cluster) log [ERR] : 1.15f soid 1:fa86fe35:::10006cdc2c5.:head: failed to pick suitable auth object /var/log/ceph/ceph-osd.126.log:932:2017-07-03 08:08:27.605139 7f95a4be6700 -1 log_channel(cluster) log [ERR] : 1.15f repair 3 errors, 0 fixed Post Removing invalid object: /var/log/ceph/ceph-osd.126.log:3433:2017-07-03 08:37:03.780584 7f95a73eb700 -1 log_channel(cluster) log [ERR] : 1.15f shard 63: soid 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 alloc_hint [0 0 0]) /var/log/ceph/ceph-osd.126.log:3434:2017-07-03 08:37:03.780591 7f95a73eb700 -1 log_channel(cluster) log [ERR] : 1.15f shard 126: soid 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 alloc_hint [0 0 0]) /var/log/ceph/ceph-osd.126.log:3435:2017-07-03 08:37:03.780593 7f95a73eb700 -1 log_channel(cluster) log [ERR] : 1.15f shard 143 missing 1:fa86fe35:::10006cdc2c5.:head /var/log/ceph/ceph-osd.126.log:3436:2017-07-03 08:37:03.780594 7f95a73eb700 -1 log_channel(cluster) log [ERR] : 1.15f soid 1:fa86fe35:::10006cdc2c5.:head: failed to pick suitable auth object /var/log/ceph/ceph-osd.126.log:3437:2017-07-03 08:37:39.278991 7f95a4be6700 -1 log_channel(cluster) log [ERR] : 1.15f repair 3 errors, 0 fixed Is it possible this thread is related to the error we are seeing? Rhian Resnick Assistant Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Gregory Farnum <gfar...@redhat.com> Sent: Monday, May 15, 2017 6:28 PM To: Lincoln Bryant; Weil, Sage Cc: ceph-users Subject: Re: [ceph-users] Inconsistent pgs with size_mismatch_oi On Mon, May 15, 2017 at 3:19 PM Lincoln Bryant <linco...@uchicago.edu<mailto:linco...@uchicago.edu>> wrote: Hi Greg, Curiously, some of these scrub errors went away on their own. The example pg in the original post is now active+clean, and nothing interesting in the logs: # zgrep "36.277b" ceph-osd.244*gz ceph-osd.244.log-20170510.gz:2017-05-09 06:56:40.739855 7f0184623700 0 log_channel(cluster) log [INF] : 36.277b scrub starts ceph-osd.244.log-20170510.gz:2017-05-09 06:58:01.872484 7f0186e28700 0 log_channel(cluster) log [INF] : 36.277b scrub ok ceph-osd.244.log-20170511.gz:2017-05-10 20:40:47.536974 7f0186e28700 0 log_channel(cluster) log [INF] : 36.277b scrub starts ceph-osd.244.log-20170511.gz:2017-05-10 20:41:38.399614 7f0184623700 0 log_channel(cluster) log [INF] : 36.277b scrub ok ceph-osd.244.log-20170514.gz:2017-05-13 20:49:47.063789 7f0186e28700 0 log_channel(cluster) log [INF] : 36.277b scrub starts ceph-osd.244.log-20170514.gz:2017-05-13 20:50:42.085718 7f0186e28700 0 log_channel(cluster) log [INF] : 36.277b scrub ok ceph-osd.244.log-20170515.gz:2017-05-15 00:10:39.417578 7f0184623700 0 log_channel(cluster) log [INF] : 36.277b scrub starts ceph-osd.244.log-20170515.gz:2017-05-15 00:11:26.189777 7f0186
Re: [ceph-users] Odd latency numbers
Regarding opennebula it is working, we do find the network functionality less then flexible. We would prefer the orchestration layer allow each primary group to create a network infrastructure internally to meet their needs and then automatically provide nat from one or more public ip addresses (think aws and azure). This doesn't seem to be implemented at this time and will likely require manual intervention per group of users to resolve. Otherwise we like the software and find it much more lightweight then openstack. We need a tool that can be managed by a very small team and opennebula meets that goal. Thanks for checking this out this data for our test cluster, it isn't production so yes we are throwing the spaghetti on the wall trying to make sure our we are able to handle issues as they come up. We already planned to increase the pg count and have done so. (thanks) Here is our osd tree, as this is test we are currently sharing the osd disks for cache tier (replica 3) and data (erasure), some more hardware is on the way so we can test the using SSD's. We have been reviewing atop, iostat, sar, and our snmp monitoring (not granular enough) and have confirmed the disks on this particular node are under a higher load then the others. We will likely take the time to deploy graphite since it will help with another project also. On speculation that was discussed this morning is a bad cache battery on the perc card in ceph-mon1 which could explain the +10 ms latency we see on all for drives. (Wouldn't be ceph at all in this case) ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 3.12685 root default -2 1.08875 host ceph-mon1 0 0.27219 osd.0 up 1.0 1.0 1 0.27219 osd.1 up 1.0 1.0 2 0.27219 osd.2 up 1.0 1.0 4 0.27219 osd.4 up 1.0 1.0 -3 0.94936 host ceph-mon2 3 0.27219 osd.3 up 1.0 1.0 5 0.27219 osd.5 up 1.0 1.0 7 0.27219 osd.7 up 1.0 1.0 9 0.13280 osd.9 up 1.0 1.0 -4 1.08875 host ceph-mon3 6 0.27219 osd.6 up 1.0 1.0 8 0.27219 osd.8 up 1.0 1.0 10 0.27219 osd.10 up 1.0 1.0 11 0.27219 osd.11 up 1.0 1.0 Rhian Resnick Assistant Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: Christian Balzer <ch...@gol.com> Sent: Wednesday, March 15, 2017 8:31 PM To: ceph-users@lists.ceph.com Cc: Rhian Resnick Subject: Re: [ceph-users] Odd latency numbers Hello, On Wed, 15 Mar 2017 16:49:00 + Rhian Resnick wrote: > Morning all, > > > We starting to apply load to our test cephfs system and are noticing some odd > latency numbers. We are using erasure coding for the cold data pools and > replication for our our cache tiers (not on ssd yet) . We noticed the > following high latency on one node and it seams to be slowing down writes and > reads on the cluster. > The pg dump below was massive overkill at this point in time, whereas a "ceph osd tree" would have probably shown us the topology (where is your tier, where your EC pool(s)?). Same for a "ceph osd pool ls detail". So if we were to assume that node is you cache tier (replica 1?), then the latencies would make sense. But that's guesswork, so describe your cluster in more detail. And yes, a single slow OSD (stealthily failing drive, etc) can bring a cluster to its knees. This is why many people here tend to get every last bit of info with collectd and feed it into carbon and graphite/grafana, etc. This will immediately indicate culprits and allow you to correlate this with other data, like actual disk/network/cpu load, etc. For the time being run atop on that node and see if you can reduce the issue to something like "all disk are busy all the time" or "CPU meltdown". > > Our next step is break out mds, mgr, and mons to different machines but we > wanted to start the discussion here. > If your nodes (not a single iota of HW/NW info from you) are powerful enough, breaking out stuff isn't likely to help or a necessity. More below. > > Here is a bunch of information you may find useful. > > > ceph.conf > > [global] > fsid = X > mon_initial_members = ceph-mon1, ceph-mon2, ceph-mon3 > mon_host = 10.141.167.238,10.141.160.251,10.141.161.249 > auth_cluster_required = cephx > auth_service_requ
[ceph-users] Odd latency numbers
:52.1942570'0 99:61 [2,11,7] 2 [2,11,7] 20'0 2017-03-15 07:35:23.370607 0'0 2017-03-12 22:50:44.393477 0.33 0 00 0 0 00 0 active+clean 2017-03-15 10:59:52.1965500'0 99:37 [3,10,2] 3 [3,10,2] 30'0 2017-03-14 10:22:36.232867 0'0 2017-03-12 22:50:44.393478 0.34 0 00 0 0 00 0 active+clean 2017-03-15 11:00:36.9608760'0 99:48 [9,11,2] 9 [9,11,2] 90'0 2017-03-15 11:00:36.960827 0'0 2017-03-12 22:50:44.393479 0.35 0 00 0 0 00 0 active+clean 2017-03-15 10:59:52.1199400'0 99:46 [5,6,1] 5 [5,6,1] 50'0 2017-03-14 05:38:36.747488 0'0 2017-03-12 22:50:44.393481 0.36 0 00 0 0 00 0 active+clean 2017-03-15 10:59:50.0872360'0 97:48 [9,6,4] 9 [9,6,4] 90'0 2017-03-15 09:17:58.121500 0'0 2017-03-12 22:50:44.393482 0.37 0 00 0 0 00 0 active+clean 2017-03-15 10:59:52.1947850'0 99:39 [9,2,8] 9 [9,2,8] 90'0 2017-03-15 04:47:09.503797 0'0 2017-03-12 22:50:44.393483 0.38 0 00 0 0 00 0 active+clean 2017-03-15 10:59:45.7985990'0 94:56 [9,0,11] 9 [9,0,11] 90'0 2017-03-15 04:44:51.485759 0'0 2017-03-12 22:50:44.393484 0.39 0 00 0 0 00 0 active+clean 2017-03-15 11:52:52.3924200'0 100:48 [9,1,6] 9 [9,1,6] 90'0 2017-03-15 11:52:52.392352 0'0 2017-03-12 22:50:44.393486 0.3a 0 00 0 0 00 0 active+clean 2017-03-15 10:59:50.0866270'0 97:59 [4,8,7] 4 [4,8,7] 40'0 2017-03-14 09:43:11.863773 0'0 2017-03-14 09:43:11.863773 0.3b 0 00 0 0 00 0 active+clean 2017-03-15 10:59:52.1710560'0 99:43 [2,8,5] 2 [2,8,5] 20'0 2017-03-15 07:41:47.424549 0'0 2017-03-12 22:50:44.393488 0.3c 0 00 0 0 00 0 active+clean 2017-03-15 11:02:00.9813320'0 99:39 [9,2,8] 9 [9,2,8] 90'0 2017-03-15 11:02:00.981264 0'0 2017-03-12 22:50:44.393490 0.3d 0 00 0 0 00 0 active+clean 2017-03-15 10:59:52.5408310'0 99:61 [1,10,7] 1 [1,10,7] 10'0 2017-03-15 07:55:33.313612 0'0 2017-03-12 22:50:44.393491 0.3e 0 00 0 0 00 0 active+clean 2017-03-15 10:59:52.1957870'0 99:30 [8,2,3] 8 [8,2,3] 80'0 2017-03-15 02:28:17.185544 0'0 2017-03-12 22:50:44.393492 0.3f 0 00 0 0 00 0 active+clean 2017-03-15 10:59:52.2655300'0 99:43 [2,11,5] 2 [2,11,5] 20'0 2017-03-15 00:30:38.822799 0'0 2017-03-12 22:50:44.393493 5 0 0 0 0 00 0 0 4 0 0 0 0 00 0 0 3 52275 0 0 0 0 48371113 36638 36638 2 650158 0 0 0 0 426305171964 36623 36623 1 466451 0 0 0 0 79835701754 36540 36540 0 0 0 0 0 00 0 0 sum 1168884 0 0 0 0 506189244831 109801 109801 OSD_STAT USED AVAIL TOTAL HB_PEERSPG_SUM 11 56284M 223G 278G [0,1,2,3,4,5,7,8,9,10] 28 10 97062M 183G 278G [0,1,2,4,5,6,7,8,9,11] 30 041564M 238G 278G [1,3,4,5,6,7,8,9,10,11] 22 1 123G 154G 278G [0,2,3,5,6,7,8,9,10,11] 42 2 112G 166G 278G [1,3,4,5,6,7,8,9,10,11] 28 447643M 232G 278G [0,1,3,5,6,7,8,9,10,11] 32 397557M 183G 278G [0,1,2,4,6,7,8,9,10,11] 32 5 127G 151G 278G [0,1,2,4,6,7,8,9,10,11] 31 671151M 209G 278G [1,2,3,4,5,7,8,9,10,11] 32 779459M 201G 278G [0,1,2,4,5,6,8,9,10,11] 40 923961M 112G 136G [0,1,2,4,5,6,7,8,10,11] 21 8 104G 174G 278G [0,1,2,3,4,5,7,9,10,11] 34 sum970G 2231G 3202G Rhian Resnick Assistant Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road
Re: [ceph-users] cephfs and erasure coding
Thanks everyone for the input. We are online in our test environment and are running user workflows to make sure everything is running as expected. Rhian From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Rhian Resnick Sent: Thursday, March 9, 2017 8:31 AM To: Maxime Guyot <maxime.gu...@elits.com> Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] cephfs and erasure coding Thanks for the confirmations of what is possible. We plan on creating a new file system, rsync and delete the old one. Rhian On Mar 9, 2017 2:27 AM, Maxime Guyot <maxime.gu...@elits.com<mailto:maxime.gu...@elits.com>> wrote: Hi, >“The answer as to how to move an existing cephfs pool from replication to >erasure coding (and vice versa) is to create the new pool and rsync your data >between them.” Shouldn’t it be possible to just do the “ceph osd tier add ecpool cachepool && ceph osd tier cache-mode cachepool writeback” and let Ceph redirect the requests (CephFS or other) to the cache pool? Cheers, Maxime From: ceph-users <ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>> on behalf of David Turner <david.tur...@storagecraft.com<mailto:david.tur...@storagecraft.com>> Date: Wednesday 8 March 2017 22:27 To: Rhian Resnick <rresn...@fau.edu<mailto:rresn...@fau.edu>>, "ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>" <ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>> Subject: Re: [ceph-users] cephfs and erasure coding I use CephFS on erasure coding at home using a cache tier. It works fine for my use case, but we know nothing about your use case to know if it will work well for you. The answer as to how to move an existing cephfs pool from replication to erasure coding (and vice versa) is to create the new pool and rsync your data between them. [cid:image001.jpg@01D298AE.DE1475E0]<https://storagecraft.com> David Turner | Cloud Operations Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2760 | Mobile: 385.224.2943 If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited. ____ ____ From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Rhian Resnick [rresn...@fau.edu] Sent: Wednesday, March 08, 2017 12:54 PM To: ceph-us...@ceph.com<mailto:ceph-us...@ceph.com> Subject: [ceph-users] cephfs and erasure coding Two questions on Cephfs and erasure coding that Google couldn't answer. 1) How well does cephfs work with erasure coding? 2) How would you move an existing cephfs pool that uses replication to erasure coding? Rhian Resnick Assistant Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [mage] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs and erasure coding
Two questions on Cephfs and erasure coding that Google couldn't answer. 1) How well does cephfs work with erasure coding? 2) How would you move an existing cephfs pool that uses replication to erasure coding? Rhian Resnick Assistant Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs with large numbers of files per directory
Logan, Thank you for the feedback. Rhian Resnick Assistant Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: Logan Kuhn <log...@wolfram.com> Sent: Tuesday, February 21, 2017 8:42 AM To: Rhian Resnick Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Cephfs with large numbers of files per directory We have a very similar configuration at one point. I was fairly new when we started to move away from it, but what happened to us is that anytime a directory needed to stat, backup, ls, rsync, etc. It would take minutes to return and while it was waiting CPU load would spike due to iowait. The difference between what you've said and what we did was that we used a gateway machine, the actual cluster never had any issues with it. This was also on infernalis so things probably have changed in Jewel and Kraken. Regards, Logan - On Feb 21, 2017, at 7:37 AM, Rhian Resnick <rresn...@fau.edu> wrote: Good morning, We are currently investigating using Ceph for a KVM farm, block storage and possibly file systems (cephfs with ceph-fuse, and ceph hadoop). Our cluster will be composed of 4 nodes, ~240 OSD's, and 4 monitors providing mon and mds as required. What experience has the community had with large numbers of files in a single directory (500,000 - 5 million). We know that directory fragmentation will be required but are concerned about the stability of the implementation. Your opinions and suggestions are welcome. Thank you Rhian Resnick Assistant Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cephfs with large numbers of files per directory
Good morning, We are currently investigating using Ceph for a KVM farm, block storage and possibly file systems (cephfs with ceph-fuse, and ceph hadoop). Our cluster will be composed of 4 nodes, ~240 OSD's, and 4 monitors providing mon and mds as required. What experience has the community had with large numbers of files in a single directory (500,000 - 5 million). We know that directory fragmentation will be required but are concerned about the stability of the implementation. Your opinions and suggestions are welcome. Thank you Rhian Resnick Assistant Director Middleware and HPC Office of Information Technology Florida Atlantic University 777 Glades Road, CM22, Rm 173B Boca Raton, FL 33431 Phone 561.297.2647 Fax 561.297.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com