Hi Tom,

This is more of a ZODB problem isn't it?

Perhaps also add this as an issue here:
https://github.com/zopefoundation/ZODB

I have never seen or heard of anything like this. Pretty scary. I can't
imagine losing 4 months of data. I guess this is my chance to plug using
zrs to have a live backup of your ZODB.


-Nathan

On Tue, May 26, 2015 at 11:03 PM, Tom Cameron <[email protected]> wrote:

> Ive sent this to the zope list too but figured many people use Plone so I
> may get some feedback here too.
>
>
> We’ve been running Zope servers for about 15 years. In the last few years
> we have had a number of Zope servers instantly lose data upon restart. This
> has happened to us about 5 times over the past 2 years on 4 different
> servers.
>
> Upon restarting Zope we notice a rollback of data to sometimes months old
> and a Data.fs with a much older timestamp than expected. At this point the
> data was irretrievable. We find that it is not in any backups or anywhere
> on the file system.
>
> We have finally worked out what is happening but not exactly why. Below is
> a detailed explanation of what we are seeing.
>
> We are keen to know if anybody else is having similar issues or if perhaps
> we have setup something incorrectly and need to change how we pack the
> database. Also this may serve as a warning to others.
>
> *Explanation*
> The issue is that Zope is losing the file descriptor, the Data.fs still
> remains in the folder but Zope keeps writing to a (deleted) file descriptor
> which can be seen with lsof. Every server affected by this was running ZEO
> and had zeopack in the default configuration. We believe for some reason
> when zeopack tries to pack the ZODB, that when the Data.fs is moved and
> renamed to Data.fs.old and eventually deleted, Zope doesn’t recognize the
> new file.
>
> This is what we believe happens:
>
> 1. Zope is connected to a file Data.fs.
> 2. zeopack packs this file into a new file Data.fs.tmp
> 3. zeopack deletes any file named Data.fs.old
> 4. zeopack then renames Data.fs to Data.fs.old
> 5. zeopack then renames Data.fs.tmp to Data.fs
>
> The problem is that Zope is connected directly to the Data.fs file via its
> handler and even though it is renamed it continues to write to Data.fs.old.
>
> Later on, we do a subsequent zeopack and the Data.fs.old file is deleted
> in the process, but Zope keeps writing to this deleted file.
>
> We now have a situation where Zope is connected to a deleted file and is
> writing all its data to it. If Zope is ever stopped, the connection is lost
> and so effectively is the file.
>
> It is true that expert forensics could retrieve the file but in the first
> few instances that this happened we were totally baffled as to where the
> data had gone. In one case we had not restarted Zope for 4 months and thus
> we lost 4 months of data.
>
> *Environment*
> We are running standard Plone servers on Debian 5.0.10 in this case, with
> following:
>
>    - Plone 4.3.4.1 (4309)
>    - CMF 2.2.8
>    - Zope 2.13.22
>    - Python 2.7.8 (default, Jan 27 2015, 14:19:28) [GCC 4.3.2]
>    - PIL 2.3.0 (Pillow)
>    - ZEO: plone.recipe.zeoserver 1.2.6
>
> Effective-user is set correctly for this instance “landcarevic” and
> properly mounts the Data.fs under normal circumstances.
>
> *Detailed Symptoms*
> This is a real case we discovered and managed to retrieve the lost file.
>
> Zope will continue to write to a Data.fs that has been technically
> deleted. We are able to notice this because the Data.fs.tmp will have a
> much more recent timestamp than the Data.fs in the var/filestorage
> directory.
>
> -rw-r--r-- 1 landcarevic psacln  6735952811 2015-04-12 00:11 Data.fs
>
> -rw-r--r-- 1 landcarevic psacln    18594749 2015-04-12 00:11 Data.fs.index
>
> -rw-r--r-- 1 landcarevic psacln           6 2015-03-30 11:20 Data.fs.lock
>
> -rw-r--r-- 1 landcarevic psaserv   14600344 2015-05-26 10:27 Data.fs.tmp
>
> An “lsof | grep Data.fs” can show if the file descriptor is pointing to a
> deleted file on the fs:
>
> python2.7  9307  landcarevic    7uW     REG               0,31          6
>   4986845 /var/www/vhosts/
> landcarevic.net.au/plone43/var/filestorage/Data.fs.lock
>
> python2.7  9307  landcarevic   11u      REG               0,31   14600344
>   4986235 /var/www/vhosts/
> landcarevic.net.au/plone43/var/filestorage/Data.fs.tmp
>
> python2.7  9307  landcarevic   12u      REG               0,31 7058741434
>   4986349  (deleted)/var/www/vhosts/
> landcarevic.net.au/plone43/var/filestorage/Data.fs.old
>
> The process id for the instance is 9307 and it’s still holding open a
> Data.fs.old that has been deleted. Most likely due to a ZEO pack before
> from the cron in this case. Because it packs the Data.fs and then moves the
> old one to Data.fs.old which eventually gets deleted. But Zope never
> properly recognises the new Data.fs for some reason.
>
> *Recovery*
> As long as the Zope server doesn’t get shut down or restarted, it will
> hold this open and continue to write to the disk. It can be recovered by
> copying from /proc where /proc/{process ID}/fd/{file descriptor ID} can be
> found from running lsof as previously:
>
> # cp /proc/9307/fd/12 /home/alyssa/Data.fs-current
>
> # ls -al /home/alyssa/
>
> ...
>
> -rw-r--r--  1 root root 7058741434 2015-05-26 11:11 Data.fs-current
>
> Which has the correct timestamp and file size.
>
>
>
>
> --
> *Tom Cameron*
>
>
>
> --
> *Tom Cameron*
> Technical Director
>
> *Mooball IT*
>
> Level 1, 888 Brunswick Street, New Farm, Brisbane QLD 4005
> mob: +61 (4) 5584 1995  ph: +61 (7) 3040 4440
> http://www.mooball.com
>
> <http://www.mooball.com/>
>
> *Try Kitovu <https://kitovu.com/> - A better way to manage client work*
>
> Email disclaimer <http://www.mooball.com/email_disclaimer>
>
> _______________________________________________
> Setup mailing list
> [email protected]
> https://lists.plone.org/mailman/listinfo/plone-setup
>
>


-- 
Nathan Van Gheem
Solutions Architect
Wildcard Corp
_______________________________________________
Setup mailing list
[email protected]
https://lists.plone.org/mailman/listinfo/plone-setup

Reply via email to