Hi Tom, This is more of a ZODB problem isn't it?
Perhaps also add this as an issue here: https://github.com/zopefoundation/ZODB I have never seen or heard of anything like this. Pretty scary. I can't imagine losing 4 months of data. I guess this is my chance to plug using zrs to have a live backup of your ZODB. -Nathan On Tue, May 26, 2015 at 11:03 PM, Tom Cameron <[email protected]> wrote: > Ive sent this to the zope list too but figured many people use Plone so I > may get some feedback here too. > > > We’ve been running Zope servers for about 15 years. In the last few years > we have had a number of Zope servers instantly lose data upon restart. This > has happened to us about 5 times over the past 2 years on 4 different > servers. > > Upon restarting Zope we notice a rollback of data to sometimes months old > and a Data.fs with a much older timestamp than expected. At this point the > data was irretrievable. We find that it is not in any backups or anywhere > on the file system. > > We have finally worked out what is happening but not exactly why. Below is > a detailed explanation of what we are seeing. > > We are keen to know if anybody else is having similar issues or if perhaps > we have setup something incorrectly and need to change how we pack the > database. Also this may serve as a warning to others. > > *Explanation* > The issue is that Zope is losing the file descriptor, the Data.fs still > remains in the folder but Zope keeps writing to a (deleted) file descriptor > which can be seen with lsof. Every server affected by this was running ZEO > and had zeopack in the default configuration. We believe for some reason > when zeopack tries to pack the ZODB, that when the Data.fs is moved and > renamed to Data.fs.old and eventually deleted, Zope doesn’t recognize the > new file. > > This is what we believe happens: > > 1. Zope is connected to a file Data.fs. > 2. zeopack packs this file into a new file Data.fs.tmp > 3. zeopack deletes any file named Data.fs.old > 4. zeopack then renames Data.fs to Data.fs.old > 5. zeopack then renames Data.fs.tmp to Data.fs > > The problem is that Zope is connected directly to the Data.fs file via its > handler and even though it is renamed it continues to write to Data.fs.old. > > Later on, we do a subsequent zeopack and the Data.fs.old file is deleted > in the process, but Zope keeps writing to this deleted file. > > We now have a situation where Zope is connected to a deleted file and is > writing all its data to it. If Zope is ever stopped, the connection is lost > and so effectively is the file. > > It is true that expert forensics could retrieve the file but in the first > few instances that this happened we were totally baffled as to where the > data had gone. In one case we had not restarted Zope for 4 months and thus > we lost 4 months of data. > > *Environment* > We are running standard Plone servers on Debian 5.0.10 in this case, with > following: > > - Plone 4.3.4.1 (4309) > - CMF 2.2.8 > - Zope 2.13.22 > - Python 2.7.8 (default, Jan 27 2015, 14:19:28) [GCC 4.3.2] > - PIL 2.3.0 (Pillow) > - ZEO: plone.recipe.zeoserver 1.2.6 > > Effective-user is set correctly for this instance “landcarevic” and > properly mounts the Data.fs under normal circumstances. > > *Detailed Symptoms* > This is a real case we discovered and managed to retrieve the lost file. > > Zope will continue to write to a Data.fs that has been technically > deleted. We are able to notice this because the Data.fs.tmp will have a > much more recent timestamp than the Data.fs in the var/filestorage > directory. > > -rw-r--r-- 1 landcarevic psacln 6735952811 2015-04-12 00:11 Data.fs > > -rw-r--r-- 1 landcarevic psacln 18594749 2015-04-12 00:11 Data.fs.index > > -rw-r--r-- 1 landcarevic psacln 6 2015-03-30 11:20 Data.fs.lock > > -rw-r--r-- 1 landcarevic psaserv 14600344 2015-05-26 10:27 Data.fs.tmp > > An “lsof | grep Data.fs” can show if the file descriptor is pointing to a > deleted file on the fs: > > python2.7 9307 landcarevic 7uW REG 0,31 6 > 4986845 /var/www/vhosts/ > landcarevic.net.au/plone43/var/filestorage/Data.fs.lock > > python2.7 9307 landcarevic 11u REG 0,31 14600344 > 4986235 /var/www/vhosts/ > landcarevic.net.au/plone43/var/filestorage/Data.fs.tmp > > python2.7 9307 landcarevic 12u REG 0,31 7058741434 > 4986349 (deleted)/var/www/vhosts/ > landcarevic.net.au/plone43/var/filestorage/Data.fs.old > > The process id for the instance is 9307 and it’s still holding open a > Data.fs.old that has been deleted. Most likely due to a ZEO pack before > from the cron in this case. Because it packs the Data.fs and then moves the > old one to Data.fs.old which eventually gets deleted. But Zope never > properly recognises the new Data.fs for some reason. > > *Recovery* > As long as the Zope server doesn’t get shut down or restarted, it will > hold this open and continue to write to the disk. It can be recovered by > copying from /proc where /proc/{process ID}/fd/{file descriptor ID} can be > found from running lsof as previously: > > # cp /proc/9307/fd/12 /home/alyssa/Data.fs-current > > # ls -al /home/alyssa/ > > ... > > -rw-r--r-- 1 root root 7058741434 2015-05-26 11:11 Data.fs-current > > Which has the correct timestamp and file size. > > > > > -- > *Tom Cameron* > > > > -- > *Tom Cameron* > Technical Director > > *Mooball IT* > > Level 1, 888 Brunswick Street, New Farm, Brisbane QLD 4005 > mob: +61 (4) 5584 1995 ph: +61 (7) 3040 4440 > http://www.mooball.com > > <http://www.mooball.com/> > > *Try Kitovu <https://kitovu.com/> - A better way to manage client work* > > Email disclaimer <http://www.mooball.com/email_disclaimer> > > _______________________________________________ > Setup mailing list > [email protected] > https://lists.plone.org/mailman/listinfo/plone-setup > > -- Nathan Van Gheem Solutions Architect Wildcard Corp
_______________________________________________ Setup mailing list [email protected] https://lists.plone.org/mailman/listinfo/plone-setup
