Nathan, I can tell you 4 months data loss was not easy to deal with - and it was a very significant client.
Im not show how well zrs would work in this case - would it continue to access the deleted Data.fs file?. We were copying the Data.fs file nightly but the problem was it was not the live file and there was no descriptor at all to access the 'live' Data.fs which had long been deleted. Ill post to ZODB too in the hope that they know something. I dont think this is actually a problem with the ZODB but more an issue with how the zeopack script is written or intended to work. On Wed, May 27, 2015 at 2:20 PM, Nathan Van Gheem <[email protected]> wrote: > Hi Tom, > > This is more of a ZODB problem isn't it? > > Perhaps also add this as an issue here: > https://github.com/zopefoundation/ZODB > > I have never seen or heard of anything like this. Pretty scary. I can't > imagine losing 4 months of data. I guess this is my chance to plug using > zrs to have a live backup of your ZODB. > > > -Nathan > > On Tue, May 26, 2015 at 11:03 PM, Tom Cameron <[email protected]> wrote: > >> Ive sent this to the zope list too but figured many people use Plone so I >> may get some feedback here too. >> >> >> We’ve been running Zope servers for about 15 years. In the last few years >> we have had a number of Zope servers instantly lose data upon restart. This >> has happened to us about 5 times over the past 2 years on 4 different >> servers. >> >> Upon restarting Zope we notice a rollback of data to sometimes months old >> and a Data.fs with a much older timestamp than expected. At this point the >> data was irretrievable. We find that it is not in any backups or anywhere >> on the file system. >> >> We have finally worked out what is happening but not exactly why. Below >> is a detailed explanation of what we are seeing. >> >> We are keen to know if anybody else is having similar issues or if >> perhaps we have setup something incorrectly and need to change how we pack >> the database. Also this may serve as a warning to others. >> >> *Explanation* >> The issue is that Zope is losing the file descriptor, the Data.fs still >> remains in the folder but Zope keeps writing to a (deleted) file descriptor >> which can be seen with lsof. Every server affected by this was running ZEO >> and had zeopack in the default configuration. We believe for some reason >> when zeopack tries to pack the ZODB, that when the Data.fs is moved and >> renamed to Data.fs.old and eventually deleted, Zope doesn’t recognize the >> new file. >> >> This is what we believe happens: >> >> 1. Zope is connected to a file Data.fs. >> 2. zeopack packs this file into a new file Data.fs.tmp >> 3. zeopack deletes any file named Data.fs.old >> 4. zeopack then renames Data.fs to Data.fs.old >> 5. zeopack then renames Data.fs.tmp to Data.fs >> >> The problem is that Zope is connected directly to the Data.fs file via >> its handler and even though it is renamed it continues to write to >> Data.fs.old. >> >> Later on, we do a subsequent zeopack and the Data.fs.old file is deleted >> in the process, but Zope keeps writing to this deleted file. >> >> We now have a situation where Zope is connected to a deleted file and is >> writing all its data to it. If Zope is ever stopped, the connection is lost >> and so effectively is the file. >> >> It is true that expert forensics could retrieve the file but in the first >> few instances that this happened we were totally baffled as to where the >> data had gone. In one case we had not restarted Zope for 4 months and thus >> we lost 4 months of data. >> >> *Environment* >> We are running standard Plone servers on Debian 5.0.10 in this case, with >> following: >> >> - Plone 4.3.4.1 (4309) >> - CMF 2.2.8 >> - Zope 2.13.22 >> - Python 2.7.8 (default, Jan 27 2015, 14:19:28) [GCC 4.3.2] >> - PIL 2.3.0 (Pillow) >> - ZEO: plone.recipe.zeoserver 1.2.6 >> >> Effective-user is set correctly for this instance “landcarevic” and >> properly mounts the Data.fs under normal circumstances. >> >> *Detailed Symptoms* >> This is a real case we discovered and managed to retrieve the lost file. >> >> Zope will continue to write to a Data.fs that has been technically >> deleted. We are able to notice this because the Data.fs.tmp will have a >> much more recent timestamp than the Data.fs in the var/filestorage >> directory. >> >> -rw-r--r-- 1 landcarevic psacln 6735952811 2015-04-12 00:11 Data.fs >> >> -rw-r--r-- 1 landcarevic psacln 18594749 2015-04-12 00:11 Data.fs.index >> >> -rw-r--r-- 1 landcarevic psacln 6 2015-03-30 11:20 Data.fs.lock >> >> -rw-r--r-- 1 landcarevic psaserv 14600344 2015-05-26 10:27 Data.fs.tmp >> >> An “lsof | grep Data.fs” can show if the file descriptor is pointing to a >> deleted file on the fs: >> >> python2.7 9307 landcarevic 7uW REG 0,31 6 >> 4986845 /var/www/vhosts/ >> landcarevic.net.au/plone43/var/filestorage/Data.fs.lock >> >> python2.7 9307 landcarevic 11u REG 0,31 14600344 >> 4986235 /var/www/vhosts/ >> landcarevic.net.au/plone43/var/filestorage/Data.fs.tmp >> >> python2.7 9307 landcarevic 12u REG 0,31 7058741434 >> 4986349 (deleted)/var/www/vhosts/ >> landcarevic.net.au/plone43/var/filestorage/Data.fs.old >> >> The process id for the instance is 9307 and it’s still holding open a >> Data.fs.old that has been deleted. Most likely due to a ZEO pack before >> from the cron in this case. Because it packs the Data.fs and then moves the >> old one to Data.fs.old which eventually gets deleted. But Zope never >> properly recognises the new Data.fs for some reason. >> >> *Recovery* >> As long as the Zope server doesn’t get shut down or restarted, it will >> hold this open and continue to write to the disk. It can be recovered by >> copying from /proc where /proc/{process ID}/fd/{file descriptor ID} can be >> found from running lsof as previously: >> >> # cp /proc/9307/fd/12 /home/alyssa/Data.fs-current >> >> # ls -al /home/alyssa/ >> >> ... >> >> -rw-r--r-- 1 root root 7058741434 2015-05-26 11:11 Data.fs-current >> >> Which has the correct timestamp and file size. >> >> >> >> >> -- >> *Tom Cameron* >> >> >> >> -- >> *Tom Cameron* >> Technical Director >> >> *Mooball IT* >> >> Level 1, 888 Brunswick Street, New Farm, Brisbane QLD 4005 >> mob: +61 (4) 5584 1995 ph: +61 (7) 3040 4440 >> http://www.mooball.com >> >> <http://www.mooball.com/> >> >> *Try Kitovu <https://kitovu.com/> - A better way to manage client work* >> >> Email disclaimer <http://www.mooball.com/email_disclaimer> >> >> _______________________________________________ >> Setup mailing list >> [email protected] >> https://lists.plone.org/mailman/listinfo/plone-setup >> >> > > > -- > Nathan Van Gheem > Solutions Architect > Wildcard Corp > -- *Tom Cameron* Technical Director *Mooball IT* Level 1, 888 Brunswick Street, New Farm, Brisbane QLD 4005 mob: +61 (4) 5584 1995 ph: +61 (7) 3040 4440 http://www.mooball.com <http://www.mooball.com/> *Try Kitovu <https://kitovu.com/> - A better way to manage client work* Email disclaimer <http://www.mooball.com/email_disclaimer>
_______________________________________________ Setup mailing list [email protected] https://lists.plone.org/mailman/listinfo/plone-setup
