Re: [ceph-users] Ceph MDS continually respawning (hammer)
On Thu, May 28, 2015 at 1:04 AM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: On 05/27/2015 10:30 PM, Gregory Farnum wrote: On Wed, May 27, 2015 at 6:49 AM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: We are also running a full backup sync to cephfs, using multiple distributed rsync streams (with zkrsync), and also ran in this issue today on Hammer 0.94.1 . After setting the beacon higer, and eventually clearing the journal, it stabilized again. We were using ceph-fuse to mount the cephfs, not the ceph kernel client. What's your MDS cache size set to? I did set it to 100 before (we have 64G of ram for the mds) trying to get rid of the 'Client .. failing to respond to cache pressure' messages Oh, that's definitely enough if one client is eating it all up to run into this, without that patch I referenced. :) -Greg Did you have any warnings in the ceph log about clients not releasing caps? Unfortunately lost the logs of before it happened.. But nothing in the new logs about that, I will follow this up I think you could hit this in ceph-fuse as well on hammer, although we just merged in a fix: https://github.com/ceph/ceph/pull/4653 -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
On 05/27/2015 10:30 PM, Gregory Farnum wrote: On Wed, May 27, 2015 at 6:49 AM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: We are also running a full backup sync to cephfs, using multiple distributed rsync streams (with zkrsync), and also ran in this issue today on Hammer 0.94.1 . After setting the beacon higer, and eventually clearing the journal, it stabilized again. We were using ceph-fuse to mount the cephfs, not the ceph kernel client. What's your MDS cache size set to? I did set it to 100 before (we have 64G of ram for the mds) trying to get rid of the 'Client .. failing to respond to cache pressure' messages Did you have any warnings in the ceph log about clients not releasing caps? Unfortunately lost the logs of before it happened.. But nothing in the new logs about that, I will follow this up I think you could hit this in ceph-fuse as well on hammer, although we just merged in a fix: https://github.com/ceph/ceph/pull/4653 -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
We are also running a full backup sync to cephfs, using multiple distributed rsync streams (with zkrsync), and also ran in this issue today on Hammer 0.94.1 . After setting the beacon higer, and eventually clearing the journal, it stabilized again. We were using ceph-fuse to mount the cephfs, not the ceph kernel client. On 05/25/2015 11:55 AM, Yan, Zheng wrote: the kernel client bug should be fixed by https://github.com/ceph/ceph-client/commit/72f22efb658e6f9e126b2b0fcb065f66ffd02239 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
On Wed, May 27, 2015 at 6:49 AM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: We are also running a full backup sync to cephfs, using multiple distributed rsync streams (with zkrsync), and also ran in this issue today on Hammer 0.94.1 . After setting the beacon higer, and eventually clearing the journal, it stabilized again. We were using ceph-fuse to mount the cephfs, not the ceph kernel client. What's your MDS cache size set to? Did you have any warnings in the ceph log about clients not releasing caps? I think you could hit this in ceph-fuse as well on hammer, although we just merged in a fix: https://github.com/ceph/ceph/pull/4653 -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
On Fri, May 22, 2015 at 12:45 PM, Adam Tygart mo...@ksu.edu wrote: Fair enough. Anyway, is it safe to now increase the 'mds beacon grace' to try and get the mds server functional again? Yep! Let us know how it goes... I realize there is nothing simple about the things that are being accomplished here, and thank everyone for their hard work on making this stuff work as well as it does. :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
On 22/05/2015 20:06, Gregory Farnum wrote: Ugh. We appear to be trying to allocate too much memory for this event in the journal dump; we'll need to fix this. :( It's not even per-event, it tries to load the entire journal into memory in one go. This a hangover from the old Dumper/Resetter code that I pulled into cephfs-journal-tool. http://tracker.ceph.com/issues/11746 Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
Fair enough. Anyway, is it safe to now increase the 'mds beacon grace' to try and get the mds server functional again? I realize there is nothing simple about the things that are being accomplished here, and thank everyone for their hard work on making this stuff work as well as it does. -- Adam On Fri, May 22, 2015 at 2:39 PM, Gregory Farnum g...@gregs42.com wrote: On Fri, May 22, 2015 at 12:34 PM, Adam Tygart mo...@ksu.edu wrote: I believe I grabbed all of theses files: for x in $(rados -p metadata ls | grep -E '^200\.'); do rados -p metadata get ${x} /tmp/metadata/${x}; done tar czSf journal.tar.gz /tmp/metadata https://drive.google.com/file/d/0B4XF1RWjuGh5MVFqVFZfNmpfQWc/view?usp=sharing When this crash occurred, the rsync process should have been going through all of this sequentially, one file at a time. There weren't simultaneous rsyncs, and hadn't been for at least a couple days. I would have hoped that if the files hadn't been touched in days that *something* would have forcibly closed them and flushed them out of the caches. I also would have thought that a close file/flush in rsync (which I am sure it does, after finishing writing a file) would have let them close in the cephfs session. This is tricker than it sounds with the kernel dcache, unfortunately. We improved it a bit just last week but we'll have to try and diagnose what happened in this case more before we can say if it was that issue or something else. -Greg -- Adam On Fri, May 22, 2015 at 2:06 PM, Gregory Farnum g...@gregs42.com wrote: On Fri, May 22, 2015 at 11:34 AM, Adam Tygart mo...@ksu.edu wrote: On Fri, May 22, 2015 at 11:47 AM, John Spray john.sp...@redhat.com wrote: On 22/05/2015 15:33, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. You found a bug, or more correctly you probably found multiple bugs... It looks like your journal contains an EOpen event that lists 5307092 open files. Because the MDS only drops its lock between events, not during processing a single one, this is causing the heartbeat map to think the MDS has locked up, so it's getting killed. So firstly we have to fix this to have appropriate calls into MDS::heartbeat_reset while iterating over lists of unbounded length in EMetablob::replay. That would fix the false death of the MDS resulting from the heartbeat expiry. Secondly, this EOpen was a 2.6GB log event. Something has almost certainly gone wrong when we see that data structure grow so large, so we should really be imposing some artificial cap there and catching the situation earlier, rather than journal this monster event and only hitting issues during replay. Thirdly, something is apparently leading the MDS to think that 5 million files were open in this particular log segment. It seems like an improbable situation given that I can only see a single client in action here. More investigation needed to see how this happened. Can you describe the client workload that was going on in the run up to the system breaking? We are in the process of rsyncing 60+ million files from our old fileservers into the new cephfs stores. We are down to the last user, and it seems he has at least 25% of tthe total files. we were copying files at a rate of about 2 million a day before this crash. Anyway, actions: 1. I'm assuming your metadata is not sensitive, as you have shared this debug log. Please could you use cephfs-journal-tool journal export ~/journal.bin to grab an offline copy of the raw journal, in case we need to look at it later (this might take a while since your journal seems so large, but the resulting file should compress reasonably well with tar cSzf). Problematic. cephfs-journal-tool is giving me a stacktrace: http://people.beocat.cis.ksu.edu/~mozes/ceph/cephfs-journal-tool-stacktrace.txt 2. optimistically, you may be able to get out of this situation by modifying the mds_beacon_grace config option on the MDS (set it to something high). This will cause the MDS to continue sending beacons to the mons, even when a thread is failing to yield promptly (as in this case), thereby preventing the mons from regarding the MDS as failed. This hopefully will buy the MDS enough time to complete replay and come up, assuming it doesn't run out of memory in the process of dealing with whatever strangeness is in the journal. I can hold off on this for a bit if you have something else you'd like me to save while it is in its broken state. Ugh. We appear to be trying to allocate too much memory for this event in the journal dump; we'll need to fix this. :( In the meantime, it's a harder workflow but if you can grab all the objects in the metadata pool whose name starts with 200. that's the raw data that the journal dumper is trying to
Re: [ceph-users] Ceph MDS continually respawning (hammer)
Alright, bumping that up 10 worked. the MDS server came up and recovered. Took about 1 minute. Thanks again, guys. -- Adam On Fri, May 22, 2015 at 2:50 PM, Gregory Farnum g...@gregs42.com wrote: On Fri, May 22, 2015 at 12:45 PM, Adam Tygart mo...@ksu.edu wrote: Fair enough. Anyway, is it safe to now increase the 'mds beacon grace' to try and get the mds server functional again? Yep! Let us know how it goes... I realize there is nothing simple about the things that are being accomplished here, and thank everyone for their hard work on making this stuff work as well as it does. :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
Hi Adam, You can get the MDS to spit out more debug information like so: # ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' At least then you can see where it's at when it crashes. --Lincoln On May 22, 2015, at 9:33 AM, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. Kernel debug from cephfs client during reconnection attempts: [732586.352173] ceph: mdsc delayed_work [732586.352178] ceph: check_delayed_caps [732586.352182] ceph: lookup_mds_session 88202f01c000 210 [732586.352185] ceph: mdsc get_session 88202f01c000 210 - 211 [732586.352189] ceph: send_renew_caps ignoring mds0 (up:replay) [732586.352192] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732586.352195] ceph: mdsc put_session 88202f01c000 211 - 210 [732586.352198] ceph: mdsc delayed_work [732586.352200] ceph: check_delayed_caps [732586.352202] ceph: lookup_mds_session 881036cbf800 1 [732586.352205] ceph: mdsc get_session 881036cbf800 1 - 2 [732586.352207] ceph: send_renew_caps ignoring mds0 (up:replay) [732586.352210] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732586.352212] ceph: mdsc put_session 881036cbf800 2 - 1 [732591.357123] ceph: mdsc delayed_work [732591.357128] ceph: check_delayed_caps [732591.357132] ceph: lookup_mds_session 88202f01c000 210 [732591.357135] ceph: mdsc get_session 88202f01c000 210 - 211 [732591.357139] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732591.357142] ceph: mdsc put_session 88202f01c000 211 - 210 [732591.357145] ceph: mdsc delayed_work [732591.357147] ceph: check_delayed_caps [732591.357149] ceph: lookup_mds_session 881036cbf800 1 [732591.357152] ceph: mdsc get_session 881036cbf800 1 - 2 [732591.357154] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732591.357157] ceph: mdsc put_session 881036cbf800 2 - 1 [732596.362076] ceph: mdsc delayed_work [732596.362081] ceph: check_delayed_caps [732596.362084] ceph: lookup_mds_session 88202f01c000 210 [732596.362087] ceph: mdsc get_session 88202f01c000 210 - 211 [732596.362091] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732596.362094] ceph: mdsc put_session 88202f01c000 211 - 210 [732596.362097] ceph: mdsc delayed_work [732596.362099] ceph: check_delayed_caps [732596.362101] ceph: lookup_mds_session 881036cbf800 1 [732596.362104] ceph: mdsc get_session 881036cbf800 1 - 2 [732596.362106] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732596.362109] ceph: mdsc put_session 881036cbf800 2 - 1 Anybody have any debugging tips, or have any ideas on how to get an mds stable? Server info: CentOS 7.1 with Ceph 0.94.1 Client info: Gentoo, kernel cephfs. 3.19.5-gentoo I'd reboot the client, but at this point, I don't believe this is a client issue. [1] https://drive.google.com/file/d/0B4XF1RWjuGh5WU1OZXpNb0Z1ck0/view?usp=sharing -- Adam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
Thanks for the quick response. I had 'debug mds = 20' in the first log, I added 'debug ms = 1' for this one: https://drive.google.com/file/d/0B4XF1RWjuGh5bXFnRzE1SHF6blE/view?usp=sharing Based on these logs, it looks like heartbeat_map is_healthy 'MDS' just times out and then the mds gets respawned. -- Adam On Fri, May 22, 2015 at 9:42 AM, Lincoln Bryant linco...@uchicago.edu wrote: Hi Adam, You can get the MDS to spit out more debug information like so: # ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' At least then you can see where it's at when it crashes. --Lincoln On May 22, 2015, at 9:33 AM, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. Kernel debug from cephfs client during reconnection attempts: [732586.352173] ceph: mdsc delayed_work [732586.352178] ceph: check_delayed_caps [732586.352182] ceph: lookup_mds_session 88202f01c000 210 [732586.352185] ceph: mdsc get_session 88202f01c000 210 - 211 [732586.352189] ceph: send_renew_caps ignoring mds0 (up:replay) [732586.352192] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732586.352195] ceph: mdsc put_session 88202f01c000 211 - 210 [732586.352198] ceph: mdsc delayed_work [732586.352200] ceph: check_delayed_caps [732586.352202] ceph: lookup_mds_session 881036cbf800 1 [732586.352205] ceph: mdsc get_session 881036cbf800 1 - 2 [732586.352207] ceph: send_renew_caps ignoring mds0 (up:replay) [732586.352210] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732586.352212] ceph: mdsc put_session 881036cbf800 2 - 1 [732591.357123] ceph: mdsc delayed_work [732591.357128] ceph: check_delayed_caps [732591.357132] ceph: lookup_mds_session 88202f01c000 210 [732591.357135] ceph: mdsc get_session 88202f01c000 210 - 211 [732591.357139] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732591.357142] ceph: mdsc put_session 88202f01c000 211 - 210 [732591.357145] ceph: mdsc delayed_work [732591.357147] ceph: check_delayed_caps [732591.357149] ceph: lookup_mds_session 881036cbf800 1 [732591.357152] ceph: mdsc get_session 881036cbf800 1 - 2 [732591.357154] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732591.357157] ceph: mdsc put_session 881036cbf800 2 - 1 [732596.362076] ceph: mdsc delayed_work [732596.362081] ceph: check_delayed_caps [732596.362084] ceph: lookup_mds_session 88202f01c000 210 [732596.362087] ceph: mdsc get_session 88202f01c000 210 - 211 [732596.362091] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732596.362094] ceph: mdsc put_session 88202f01c000 211 - 210 [732596.362097] ceph: mdsc delayed_work [732596.362099] ceph: check_delayed_caps [732596.362101] ceph: lookup_mds_session 881036cbf800 1 [732596.362104] ceph: mdsc get_session 881036cbf800 1 - 2 [732596.362106] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732596.362109] ceph: mdsc put_session 881036cbf800 2 - 1 Anybody have any debugging tips, or have any ideas on how to get an mds stable? Server info: CentOS 7.1 with Ceph 0.94.1 Client info: Gentoo, kernel cephfs. 3.19.5-gentoo I'd reboot the client, but at this point, I don't believe this is a client issue. [1] https://drive.google.com/file/d/0B4XF1RWjuGh5WU1OZXpNb0Z1ck0/view?usp=sharing -- Adam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
I've experienced MDS issues in the past, but nothing sticks out to me in your logs. Are you using a single active MDS with failover, or multiple active MDS? --Lincoln On May 22, 2015, at 10:10 AM, Adam Tygart wrote: Thanks for the quick response. I had 'debug mds = 20' in the first log, I added 'debug ms = 1' for this one: https://drive.google.com/file/d/0B4XF1RWjuGh5bXFnRzE1SHF6blE/view?usp=sharing Based on these logs, it looks like heartbeat_map is_healthy 'MDS' just times out and then the mds gets respawned. -- Adam On Fri, May 22, 2015 at 9:42 AM, Lincoln Bryant linco...@uchicago.edu wrote: Hi Adam, You can get the MDS to spit out more debug information like so: # ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' At least then you can see where it's at when it crashes. --Lincoln On May 22, 2015, at 9:33 AM, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. Kernel debug from cephfs client during reconnection attempts: [732586.352173] ceph: mdsc delayed_work [732586.352178] ceph: check_delayed_caps [732586.352182] ceph: lookup_mds_session 88202f01c000 210 [732586.352185] ceph: mdsc get_session 88202f01c000 210 - 211 [732586.352189] ceph: send_renew_caps ignoring mds0 (up:replay) [732586.352192] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732586.352195] ceph: mdsc put_session 88202f01c000 211 - 210 [732586.352198] ceph: mdsc delayed_work [732586.352200] ceph: check_delayed_caps [732586.352202] ceph: lookup_mds_session 881036cbf800 1 [732586.352205] ceph: mdsc get_session 881036cbf800 1 - 2 [732586.352207] ceph: send_renew_caps ignoring mds0 (up:replay) [732586.352210] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732586.352212] ceph: mdsc put_session 881036cbf800 2 - 1 [732591.357123] ceph: mdsc delayed_work [732591.357128] ceph: check_delayed_caps [732591.357132] ceph: lookup_mds_session 88202f01c000 210 [732591.357135] ceph: mdsc get_session 88202f01c000 210 - 211 [732591.357139] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732591.357142] ceph: mdsc put_session 88202f01c000 211 - 210 [732591.357145] ceph: mdsc delayed_work [732591.357147] ceph: check_delayed_caps [732591.357149] ceph: lookup_mds_session 881036cbf800 1 [732591.357152] ceph: mdsc get_session 881036cbf800 1 - 2 [732591.357154] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732591.357157] ceph: mdsc put_session 881036cbf800 2 - 1 [732596.362076] ceph: mdsc delayed_work [732596.362081] ceph: check_delayed_caps [732596.362084] ceph: lookup_mds_session 88202f01c000 210 [732596.362087] ceph: mdsc get_session 88202f01c000 210 - 211 [732596.362091] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732596.362094] ceph: mdsc put_session 88202f01c000 211 - 210 [732596.362097] ceph: mdsc delayed_work [732596.362099] ceph: check_delayed_caps [732596.362101] ceph: lookup_mds_session 881036cbf800 1 [732596.362104] ceph: mdsc get_session 881036cbf800 1 - 2 [732596.362106] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732596.362109] ceph: mdsc put_session 881036cbf800 2 - 1 Anybody have any debugging tips, or have any ideas on how to get an mds stable? Server info: CentOS 7.1 with Ceph 0.94.1 Client info: Gentoo, kernel cephfs. 3.19.5-gentoo I'd reboot the client, but at this point, I don't believe this is a client issue. [1] https://drive.google.com/file/d/0B4XF1RWjuGh5WU1OZXpNb0Z1ck0/view?usp=sharing -- Adam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
I knew I forgot to include something with my initial e-mail. Single active with failover. dumped mdsmap epoch 30608 epoch 30608 flags 0 created 2015-04-02 16:15:55.209894 modified2015-05-22 11:39:15.992774 tableserver 0 root0 session_timeout 60 session_autoclose 300 max_file_size 17592186044416 last_failure30606 last_failure_osd_epoch 24298 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table} max_mds 1 in 0 up {0=20284976} failed stopped data_pools 25 metadata_pool 27 inline_data disabled 20285024: 10.5.38.2:7021/32024 'hobbit02' mds.-1.0 up:standby seq 1 20346784: 10.5.38.1:6957/223554 'hobbit01' mds.-1.0 up:standby seq 1 20284976: 10.5.38.13:6926/66700 'hobbit13' mds.0.1696 up:replay seq 1 -- Adam On Fri, May 22, 2015 at 11:37 AM, Lincoln Bryant linco...@uchicago.edu wrote: I've experienced MDS issues in the past, but nothing sticks out to me in your logs. Are you using a single active MDS with failover, or multiple active MDS? --Lincoln On May 22, 2015, at 10:10 AM, Adam Tygart wrote: Thanks for the quick response. I had 'debug mds = 20' in the first log, I added 'debug ms = 1' for this one: https://drive.google.com/file/d/0B4XF1RWjuGh5bXFnRzE1SHF6blE/view?usp=sharing Based on these logs, it looks like heartbeat_map is_healthy 'MDS' just times out and then the mds gets respawned. -- Adam On Fri, May 22, 2015 at 9:42 AM, Lincoln Bryant linco...@uchicago.edu wrote: Hi Adam, You can get the MDS to spit out more debug information like so: # ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' At least then you can see where it's at when it crashes. --Lincoln On May 22, 2015, at 9:33 AM, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. Kernel debug from cephfs client during reconnection attempts: [732586.352173] ceph: mdsc delayed_work [732586.352178] ceph: check_delayed_caps [732586.352182] ceph: lookup_mds_session 88202f01c000 210 [732586.352185] ceph: mdsc get_session 88202f01c000 210 - 211 [732586.352189] ceph: send_renew_caps ignoring mds0 (up:replay) [732586.352192] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732586.352195] ceph: mdsc put_session 88202f01c000 211 - 210 [732586.352198] ceph: mdsc delayed_work [732586.352200] ceph: check_delayed_caps [732586.352202] ceph: lookup_mds_session 881036cbf800 1 [732586.352205] ceph: mdsc get_session 881036cbf800 1 - 2 [732586.352207] ceph: send_renew_caps ignoring mds0 (up:replay) [732586.352210] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732586.352212] ceph: mdsc put_session 881036cbf800 2 - 1 [732591.357123] ceph: mdsc delayed_work [732591.357128] ceph: check_delayed_caps [732591.357132] ceph: lookup_mds_session 88202f01c000 210 [732591.357135] ceph: mdsc get_session 88202f01c000 210 - 211 [732591.357139] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732591.357142] ceph: mdsc put_session 88202f01c000 211 - 210 [732591.357145] ceph: mdsc delayed_work [732591.357147] ceph: check_delayed_caps [732591.357149] ceph: lookup_mds_session 881036cbf800 1 [732591.357152] ceph: mdsc get_session 881036cbf800 1 - 2 [732591.357154] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732591.357157] ceph: mdsc put_session 881036cbf800 2 - 1 [732596.362076] ceph: mdsc delayed_work [732596.362081] ceph: check_delayed_caps [732596.362084] ceph: lookup_mds_session 88202f01c000 210 [732596.362087] ceph: mdsc get_session 88202f01c000 210 - 211 [732596.362091] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732596.362094] ceph: mdsc put_session 88202f01c000 211 - 210 [732596.362097] ceph: mdsc delayed_work [732596.362099] ceph: check_delayed_caps [732596.362101] ceph: lookup_mds_session 881036cbf800 1 [732596.362104] ceph: mdsc get_session 881036cbf800 1 - 2 [732596.362106] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732596.362109] ceph: mdsc put_session 881036cbf800 2 - 1 Anybody have any debugging tips, or have any ideas on how to get an mds stable? Server info: CentOS 7.1 with Ceph 0.94.1 Client info: Gentoo, kernel cephfs. 3.19.5-gentoo I'd reboot the client, but at this point, I don't believe this is a client issue. [1] https://drive.google.com/file/d/0B4XF1RWjuGh5WU1OZXpNb0Z1ck0/view?usp=sharing -- Adam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___
Re: [ceph-users] Ceph MDS continually respawning (hammer)
On 22/05/2015 15:33, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. You found a bug, or more correctly you probably found multiple bugs... It looks like your journal contains an EOpen event that lists 5307092 open files. Because the MDS only drops its lock between events, not during processing a single one, this is causing the heartbeat map to think the MDS has locked up, so it's getting killed. So firstly we have to fix this to have appropriate calls into MDS::heartbeat_reset while iterating over lists of unbounded length in EMetablob::replay. That would fix the false death of the MDS resulting from the heartbeat expiry. Secondly, this EOpen was a 2.6GB log event. Something has almost certainly gone wrong when we see that data structure grow so large, so we should really be imposing some artificial cap there and catching the situation earlier, rather than journal this monster event and only hitting issues during replay. Thirdly, something is apparently leading the MDS to think that 5 million files were open in this particular log segment. It seems like an improbable situation given that I can only see a single client in action here. More investigation needed to see how this happened. Can you describe the client workload that was going on in the run up to the system breaking? Anyway, actions: 1. I'm assuming your metadata is not sensitive, as you have shared this debug log. Please could you use cephfs-journal-tool journal export ~/journal.bin to grab an offline copy of the raw journal, in case we need to look at it later (this might take a while since your journal seems so large, but the resulting file should compress reasonably well with tar cSzf). 2. optimistically, you may be able to get out of this situation by modifying the mds_beacon_grace config option on the MDS (set it to something high). This will cause the MDS to continue sending beacons to the mons, even when a thread is failing to yield promptly (as in this case), thereby preventing the mons from regarding the MDS as failed. This hopefully will buy the MDS enough time to complete replay and come up, assuming it doesn't run out of memory in the process of dealing with whatever strangeness is in the journal. 3. If your MDS eventually makes it through recovery, unmount your client and use ceph daemon mds.id flush journal to flush and trim the journal: this should result in a situation where the next time the MDS starts, the oversized journal entries are no longer present and startup should go smoothly. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
I notice in both logs, the last entry before the MDS restart/failover is when the mds is replaying the journal and gets to /homes/gundimed/IPD/10kb/1e-500d/DisplayLog/ 2015-05-22 09:59:19.116231 7f9d930c1700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 13f8e31 [...2,head] /homes/gundimed/IPD/10kb/1e-500d/DisplayLog/ auth v20776 f(v0 m2015-05-22 02:34:09.00 357=357+0) n(v1 rc2015-05-22 02:34:09.00 b71340955004 358=357+1) (iversion lock) | dirfrag=1 dirty=1 0x6ded9c8] 2015-05-22 08:04:31.993007 7f87afb2f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 13f8e31 [...2,head] /homes/gundimed/IPD/10kb/1e-500d/DisplayLog/ auth v20776 f(v0 m2015-05-22 02:34:09.00 357=357+0) n(v1 rc2015-05-22 02:34:09.00 b71340955004 358=357+1) (iversion lock) | dirfrag=1 dirty=1 0x76a59c8] Maybe there's some problem in this part of the journal? Or maybe that's the end of the journal and it crashes afterwards? No idea :( Hopefully one of the devs can weigh in. --Lincoln On May 22, 2015, at 11:40 AM, Adam Tygart wrote: I knew I forgot to include something with my initial e-mail. Single active with failover. dumped mdsmap epoch 30608 epoch 30608 flags 0 created 2015-04-02 16:15:55.209894 modified2015-05-22 11:39:15.992774 tableserver 0 root0 session_timeout 60 session_autoclose 300 max_file_size 17592186044416 last_failure30606 last_failure_osd_epoch 24298 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table} max_mds 1 in 0 up {0=20284976} failed stopped data_pools 25 metadata_pool 27 inline_data disabled 20285024: 10.5.38.2:7021/32024 'hobbit02' mds.-1.0 up:standby seq 1 20346784: 10.5.38.1:6957/223554 'hobbit01' mds.-1.0 up:standby seq 1 20284976: 10.5.38.13:6926/66700 'hobbit13' mds.0.1696 up:replay seq 1 -- Adam On Fri, May 22, 2015 at 11:37 AM, Lincoln Bryant linco...@uchicago.edu wrote: I've experienced MDS issues in the past, but nothing sticks out to me in your logs. Are you using a single active MDS with failover, or multiple active MDS? --Lincoln On May 22, 2015, at 10:10 AM, Adam Tygart wrote: Thanks for the quick response. I had 'debug mds = 20' in the first log, I added 'debug ms = 1' for this one: https://drive.google.com/file/d/0B4XF1RWjuGh5bXFnRzE1SHF6blE/view?usp=sharing Based on these logs, it looks like heartbeat_map is_healthy 'MDS' just times out and then the mds gets respawned. -- Adam On Fri, May 22, 2015 at 9:42 AM, Lincoln Bryant linco...@uchicago.edu wrote: Hi Adam, You can get the MDS to spit out more debug information like so: # ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' At least then you can see where it's at when it crashes. --Lincoln On May 22, 2015, at 9:33 AM, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. Kernel debug from cephfs client during reconnection attempts: [732586.352173] ceph: mdsc delayed_work [732586.352178] ceph: check_delayed_caps [732586.352182] ceph: lookup_mds_session 88202f01c000 210 [732586.352185] ceph: mdsc get_session 88202f01c000 210 - 211 [732586.352189] ceph: send_renew_caps ignoring mds0 (up:replay) [732586.352192] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732586.352195] ceph: mdsc put_session 88202f01c000 211 - 210 [732586.352198] ceph: mdsc delayed_work [732586.352200] ceph: check_delayed_caps [732586.352202] ceph: lookup_mds_session 881036cbf800 1 [732586.352205] ceph: mdsc get_session 881036cbf800 1 - 2 [732586.352207] ceph: send_renew_caps ignoring mds0 (up:replay) [732586.352210] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732586.352212] ceph: mdsc put_session 881036cbf800 2 - 1 [732591.357123] ceph: mdsc delayed_work [732591.357128] ceph: check_delayed_caps [732591.357132] ceph: lookup_mds_session 88202f01c000 210 [732591.357135] ceph: mdsc get_session 88202f01c000 210 - 211 [732591.357139] ceph: add_cap_releases 88202f01c000 mds0 extra 680 [732591.357142] ceph: mdsc put_session 88202f01c000 211 - 210 [732591.357145] ceph: mdsc delayed_work [732591.357147] ceph: check_delayed_caps [732591.357149] ceph: lookup_mds_session 881036cbf800 1 [732591.357152] ceph: mdsc get_session 881036cbf800 1 - 2 [732591.357154] ceph: add_cap_releases 881036cbf800 mds0 extra 680 [732591.357157] ceph: mdsc put_session 881036cbf800 2 - 1 [732596.362076] ceph: mdsc delayed_work [732596.362081] ceph: check_delayed_caps [732596.362084] ceph: lookup_mds_session
Re: [ceph-users] Ceph MDS continually respawning (hammer)
On Fri, May 22, 2015 at 11:34 AM, Adam Tygart mo...@ksu.edu wrote: On Fri, May 22, 2015 at 11:47 AM, John Spray john.sp...@redhat.com wrote: On 22/05/2015 15:33, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. You found a bug, or more correctly you probably found multiple bugs... It looks like your journal contains an EOpen event that lists 5307092 open files. Because the MDS only drops its lock between events, not during processing a single one, this is causing the heartbeat map to think the MDS has locked up, so it's getting killed. So firstly we have to fix this to have appropriate calls into MDS::heartbeat_reset while iterating over lists of unbounded length in EMetablob::replay. That would fix the false death of the MDS resulting from the heartbeat expiry. Secondly, this EOpen was a 2.6GB log event. Something has almost certainly gone wrong when we see that data structure grow so large, so we should really be imposing some artificial cap there and catching the situation earlier, rather than journal this monster event and only hitting issues during replay. Thirdly, something is apparently leading the MDS to think that 5 million files were open in this particular log segment. It seems like an improbable situation given that I can only see a single client in action here. More investigation needed to see how this happened. Can you describe the client workload that was going on in the run up to the system breaking? We are in the process of rsyncing 60+ million files from our old fileservers into the new cephfs stores. We are down to the last user, and it seems he has at least 25% of tthe total files. we were copying files at a rate of about 2 million a day before this crash. Anyway, actions: 1. I'm assuming your metadata is not sensitive, as you have shared this debug log. Please could you use cephfs-journal-tool journal export ~/journal.bin to grab an offline copy of the raw journal, in case we need to look at it later (this might take a while since your journal seems so large, but the resulting file should compress reasonably well with tar cSzf). Problematic. cephfs-journal-tool is giving me a stacktrace: http://people.beocat.cis.ksu.edu/~mozes/ceph/cephfs-journal-tool-stacktrace.txt 2. optimistically, you may be able to get out of this situation by modifying the mds_beacon_grace config option on the MDS (set it to something high). This will cause the MDS to continue sending beacons to the mons, even when a thread is failing to yield promptly (as in this case), thereby preventing the mons from regarding the MDS as failed. This hopefully will buy the MDS enough time to complete replay and come up, assuming it doesn't run out of memory in the process of dealing with whatever strangeness is in the journal. I can hold off on this for a bit if you have something else you'd like me to save while it is in its broken state. Ugh. We appear to be trying to allocate too much memory for this event in the journal dump; we'll need to fix this. :( In the meantime, it's a harder workflow but if you can grab all the objects in the metadata pool whose name starts with 200. that's the raw data that the journal dumper is trying to grab and we can assemble it ourselves for checking later on. That said, you might actually have that many files still open in some fashion from your clients. If you can dump caches on them periodically, and/or limit the number of files that rsync is working on at once, that should help keep this from happening again. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
On Fri, May 22, 2015 at 11:47 AM, John Spray john.sp...@redhat.com wrote: On 22/05/2015 15:33, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. You found a bug, or more correctly you probably found multiple bugs... It looks like your journal contains an EOpen event that lists 5307092 open files. Because the MDS only drops its lock between events, not during processing a single one, this is causing the heartbeat map to think the MDS has locked up, so it's getting killed. So firstly we have to fix this to have appropriate calls into MDS::heartbeat_reset while iterating over lists of unbounded length in EMetablob::replay. That would fix the false death of the MDS resulting from the heartbeat expiry. Secondly, this EOpen was a 2.6GB log event. Something has almost certainly gone wrong when we see that data structure grow so large, so we should really be imposing some artificial cap there and catching the situation earlier, rather than journal this monster event and only hitting issues during replay. Thirdly, something is apparently leading the MDS to think that 5 million files were open in this particular log segment. It seems like an improbable situation given that I can only see a single client in action here. More investigation needed to see how this happened. Can you describe the client workload that was going on in the run up to the system breaking? We are in the process of rsyncing 60+ million files from our old fileservers into the new cephfs stores. We are down to the last user, and it seems he has at least 25% of tthe total files. we were copying files at a rate of about 2 million a day before this crash. Anyway, actions: 1. I'm assuming your metadata is not sensitive, as you have shared this debug log. Please could you use cephfs-journal-tool journal export ~/journal.bin to grab an offline copy of the raw journal, in case we need to look at it later (this might take a while since your journal seems so large, but the resulting file should compress reasonably well with tar cSzf). Problematic. cephfs-journal-tool is giving me a stacktrace: http://people.beocat.cis.ksu.edu/~mozes/ceph/cephfs-journal-tool-stacktrace.txt 2. optimistically, you may be able to get out of this situation by modifying the mds_beacon_grace config option on the MDS (set it to something high). This will cause the MDS to continue sending beacons to the mons, even when a thread is failing to yield promptly (as in this case), thereby preventing the mons from regarding the MDS as failed. This hopefully will buy the MDS enough time to complete replay and come up, assuming it doesn't run out of memory in the process of dealing with whatever strangeness is in the journal. I can hold off on this for a bit if you have something else you'd like me to save while it is in its broken state. Thanks for the detailed writeup. -- Adam 3. If your MDS eventually makes it through recovery, unmount your client and use ceph daemon mds.id flush journal to flush and trim the journal: this should result in a situation where the next time the MDS starts, the oversized journal entries are no longer present and startup should go smoothly. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
I believe I grabbed all of theses files: for x in $(rados -p metadata ls | grep -E '^200\.'); do rados -p metadata get ${x} /tmp/metadata/${x}; done tar czSf journal.tar.gz /tmp/metadata https://drive.google.com/file/d/0B4XF1RWjuGh5MVFqVFZfNmpfQWc/view?usp=sharing When this crash occurred, the rsync process should have been going through all of this sequentially, one file at a time. There weren't simultaneous rsyncs, and hadn't been for at least a couple days. I would have hoped that if the files hadn't been touched in days that *something* would have forcibly closed them and flushed them out of the caches. I also would have thought that a close file/flush in rsync (which I am sure it does, after finishing writing a file) would have let them close in the cephfs session. -- Adam On Fri, May 22, 2015 at 2:06 PM, Gregory Farnum g...@gregs42.com wrote: On Fri, May 22, 2015 at 11:34 AM, Adam Tygart mo...@ksu.edu wrote: On Fri, May 22, 2015 at 11:47 AM, John Spray john.sp...@redhat.com wrote: On 22/05/2015 15:33, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. You found a bug, or more correctly you probably found multiple bugs... It looks like your journal contains an EOpen event that lists 5307092 open files. Because the MDS only drops its lock between events, not during processing a single one, this is causing the heartbeat map to think the MDS has locked up, so it's getting killed. So firstly we have to fix this to have appropriate calls into MDS::heartbeat_reset while iterating over lists of unbounded length in EMetablob::replay. That would fix the false death of the MDS resulting from the heartbeat expiry. Secondly, this EOpen was a 2.6GB log event. Something has almost certainly gone wrong when we see that data structure grow so large, so we should really be imposing some artificial cap there and catching the situation earlier, rather than journal this monster event and only hitting issues during replay. Thirdly, something is apparently leading the MDS to think that 5 million files were open in this particular log segment. It seems like an improbable situation given that I can only see a single client in action here. More investigation needed to see how this happened. Can you describe the client workload that was going on in the run up to the system breaking? We are in the process of rsyncing 60+ million files from our old fileservers into the new cephfs stores. We are down to the last user, and it seems he has at least 25% of tthe total files. we were copying files at a rate of about 2 million a day before this crash. Anyway, actions: 1. I'm assuming your metadata is not sensitive, as you have shared this debug log. Please could you use cephfs-journal-tool journal export ~/journal.bin to grab an offline copy of the raw journal, in case we need to look at it later (this might take a while since your journal seems so large, but the resulting file should compress reasonably well with tar cSzf). Problematic. cephfs-journal-tool is giving me a stacktrace: http://people.beocat.cis.ksu.edu/~mozes/ceph/cephfs-journal-tool-stacktrace.txt 2. optimistically, you may be able to get out of this situation by modifying the mds_beacon_grace config option on the MDS (set it to something high). This will cause the MDS to continue sending beacons to the mons, even when a thread is failing to yield promptly (as in this case), thereby preventing the mons from regarding the MDS as failed. This hopefully will buy the MDS enough time to complete replay and come up, assuming it doesn't run out of memory in the process of dealing with whatever strangeness is in the journal. I can hold off on this for a bit if you have something else you'd like me to save while it is in its broken state. Ugh. We appear to be trying to allocate too much memory for this event in the journal dump; we'll need to fix this. :( In the meantime, it's a harder workflow but if you can grab all the objects in the metadata pool whose name starts with 200. that's the raw data that the journal dumper is trying to grab and we can assemble it ourselves for checking later on. That said, you might actually have that many files still open in some fashion from your clients. If you can dump caches on them periodically, and/or limit the number of files that rsync is working on at once, that should help keep this from happening again. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph MDS continually respawning (hammer)
On Fri, May 22, 2015 at 12:34 PM, Adam Tygart mo...@ksu.edu wrote: I believe I grabbed all of theses files: for x in $(rados -p metadata ls | grep -E '^200\.'); do rados -p metadata get ${x} /tmp/metadata/${x}; done tar czSf journal.tar.gz /tmp/metadata https://drive.google.com/file/d/0B4XF1RWjuGh5MVFqVFZfNmpfQWc/view?usp=sharing When this crash occurred, the rsync process should have been going through all of this sequentially, one file at a time. There weren't simultaneous rsyncs, and hadn't been for at least a couple days. I would have hoped that if the files hadn't been touched in days that *something* would have forcibly closed them and flushed them out of the caches. I also would have thought that a close file/flush in rsync (which I am sure it does, after finishing writing a file) would have let them close in the cephfs session. This is tricker than it sounds with the kernel dcache, unfortunately. We improved it a bit just last week but we'll have to try and diagnose what happened in this case more before we can say if it was that issue or something else. -Greg -- Adam On Fri, May 22, 2015 at 2:06 PM, Gregory Farnum g...@gregs42.com wrote: On Fri, May 22, 2015 at 11:34 AM, Adam Tygart mo...@ksu.edu wrote: On Fri, May 22, 2015 at 11:47 AM, John Spray john.sp...@redhat.com wrote: On 22/05/2015 15:33, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot-replay-crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. You found a bug, or more correctly you probably found multiple bugs... It looks like your journal contains an EOpen event that lists 5307092 open files. Because the MDS only drops its lock between events, not during processing a single one, this is causing the heartbeat map to think the MDS has locked up, so it's getting killed. So firstly we have to fix this to have appropriate calls into MDS::heartbeat_reset while iterating over lists of unbounded length in EMetablob::replay. That would fix the false death of the MDS resulting from the heartbeat expiry. Secondly, this EOpen was a 2.6GB log event. Something has almost certainly gone wrong when we see that data structure grow so large, so we should really be imposing some artificial cap there and catching the situation earlier, rather than journal this monster event and only hitting issues during replay. Thirdly, something is apparently leading the MDS to think that 5 million files were open in this particular log segment. It seems like an improbable situation given that I can only see a single client in action here. More investigation needed to see how this happened. Can you describe the client workload that was going on in the run up to the system breaking? We are in the process of rsyncing 60+ million files from our old fileservers into the new cephfs stores. We are down to the last user, and it seems he has at least 25% of tthe total files. we were copying files at a rate of about 2 million a day before this crash. Anyway, actions: 1. I'm assuming your metadata is not sensitive, as you have shared this debug log. Please could you use cephfs-journal-tool journal export ~/journal.bin to grab an offline copy of the raw journal, in case we need to look at it later (this might take a while since your journal seems so large, but the resulting file should compress reasonably well with tar cSzf). Problematic. cephfs-journal-tool is giving me a stacktrace: http://people.beocat.cis.ksu.edu/~mozes/ceph/cephfs-journal-tool-stacktrace.txt 2. optimistically, you may be able to get out of this situation by modifying the mds_beacon_grace config option on the MDS (set it to something high). This will cause the MDS to continue sending beacons to the mons, even when a thread is failing to yield promptly (as in this case), thereby preventing the mons from regarding the MDS as failed. This hopefully will buy the MDS enough time to complete replay and come up, assuming it doesn't run out of memory in the process of dealing with whatever strangeness is in the journal. I can hold off on this for a bit if you have something else you'd like me to save while it is in its broken state. Ugh. We appear to be trying to allocate too much memory for this event in the journal dump; we'll need to fix this. :( In the meantime, it's a harder workflow but if you can grab all the objects in the metadata pool whose name starts with 200. that's the raw data that the journal dumper is trying to grab and we can assemble it ourselves for checking later on. That said, you might actually have that many files still open in some fashion from your clients. If you can dump caches on them periodically, and/or limit the number of files that rsync is working on at once, that should help keep this from happening again. -Greg ___