Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10 Apr 2007, at 20:19, Alan Runyan wrote: File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (113, 'No route to host') File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') With errors like this I would concentrate on the network first (hardware as well as network settings/routes etc) and make 100% sure you don't have anything strange happening on that end. jens -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) iD8DBQFGHIWaRAx5nvEhZLIRAmnLAJwP2Jc28uDJy+i0DIfRPvZU8aW6rwCfa8Kz GLobtRNCkuwTG2IskkNnOig= =wOYW -END PGP SIGNATURE- ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
On 10 Apr 2007, at 21:39, Chris Withers wrote: I'd look at the switches and maybe even the nics and cables :-S And if you are doing that, also check your network to make sure all IP addresses are unique. (..but I'm not a sysadmin.) And neither am I! --r Russ Ferriday - Topia Systems - Open Source content management with Plone and Zope [EMAIL PROTECTED] - office: +44 2076 1777588 - mobile: +44 7789 338868 - skype: ferriday a member of Zea Partners ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Alan Runyan wrote: >data = self.socket.recv(buffer_size) > error: (113, 'No route to host') That *is* very odd, anything other than pound being used for load balancing or traffic shaping? This has to be a major problem maker in the system. Pound is simply round robin connections to pool of Zope. I'd look at the switches and maybe even the nics and cables :-S (..but I'm not a sysadmin.) > File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec > v >data = self.socket.recv(buffer_size) > error: (110, 'Connection timed out') Storage server too busy :-( the storage server can be "too busy" when its only reading? 'corse - too many pickles getting sucked down... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZEO client cache tempfile oddness
[Paul Winkler] ... If I understand this stuff correctly, the code in question on a filesystem that *doesn't* have the sparse file optimization would equate to "write N null bytes to this file as fast as possible." True? [Dieter Maurer] Posix defines the semantics. I have not looked it up, but a possible interpretation would also be: write the "n.th" byte and let all other bytes undefined. POSIX specifies that uninitialized file positions must "act as if" they contained NUL bytes. NTFS isn't POSIX, but NTFS in fact does "write N NUL bytes to the file as fast as possible" for a non-sparse file (and for a sparse file, NTFS acts like a POSIX file in this respect). ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZEO client cache tempfile oddness
[Paul Winkler] Our experiments suggest that ext2, ext3, and reiserfs optimize for sparse files so there is no such guarantee. AFAICT from some quick googling and wikipediaing, the same is true for NTFS, XFS, JFS, ZFS. I suspect we've accounted for the majority of the production Zope installations in the world. [Benji York] In that case it would seem better to just remove the ineffectual code altogether. [Jim Fulton] +1 -0. See my other response. +1 on beefing up the comment, though. On all systems the code does no harm and does /inform/ the OS of how large a file is needed. At least on NTFS it also does all that reasonably can be done to ensure enough space actually exists (while NTFS supports sparse files, it's not the default, and the ZEO code does not create a sparse file under NTFS). The Windows behavior could be effectively gotten on most other platforms by adding a loop to write a non-NUL byte to every (say) 512th byte position. This would be redundant on Windows (NTFS physically writes NUL bytes to every "missing" position in a dense file whenever the EOF pointer advances), but would do no harm there either. Or it could be that people are happy to have ZEO blow up later ;-) ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZEO client cache tempfile oddness
[Tim Peters] The ZEO client cache is stored in a fixed-size disk file. When a ZEO client needs to create this file for the first time, it's trying to ensure there's enough space on disk for it at the start, and reserve that disk space then, rather than risk dying with a "no space left on device" error umpteen hours later. Alas, there is no portable way to do so (the C standard says nothing about physical devices such as disks). [Paul Winkler] So it might work as intended on some systems but inspire false confidence on others. That's sad, but I guess we can't do much about it. The full intent is almost met on Windows. On other systems it at least /informs/ the OS about the intended size of the file. In any case it does no harm. [Paul] ... "saves" means. Does the behavior vary on different filesystems? Yes. Sounds like it optimizes for sparse files. Not all filesystems do. OK... I'm still wondering on which filesystems this code actually does guarantee sufficient space. Our experiments suggest that ext2, ext3, and reiserfs optimize for sparse files so there is no such guarantee. AFAICT from some quick googling and wikipediaing, the same is true for NTFS, XFS, JFS, ZFS. I suspect we've accounted for the majority of the production Zope installations in the world. The only fs I found that has no sparse file support is HFS+. Not sure about UFS (I found people claiming both no and yes). NTFS has "optional" sparse-file support, meaning there is a way to create sparse files under NTFS, but it's not the default. It isn't exposed at the C stdio level (creating a sparse file under NTFS requires additional Windows-specific API calls), and Python builds on C stdio. In any case there's no benefit to sparse files in this context: while the ZEO cache file starts out "almost empty", every byte is eventually used, so it seems good to make whatever cheap efforts can be made to minimize the chance that ZEO will die after umpteen hours if there's not enough space for the client cache file it needs at the start. If you care, the intent could probably be better served by changing the code to write a junk byte at every (say) thousandth byte offset throughout the file (at least one non-NUL byte per disk block). Alas, there would still be no guarantee that there's actually enough space on the physical disk to store it. OK, then I don't see how we could get a real guarantee without actually writing junk to every byte, which might be just a little slower :) Still no guarantee (e.g., there might be enough RAM to hold all the bytes in I/O buffers, but not enough disk space remaining to materialize them), but writing a non-NUL byte per disk block would be as effective in practice as writing to every byte position. Short of that, since crystal ball technology is still not widely deployed outside the Python Secret Underground, we can't know if currently free space will still be available when we need it. That's right. On all platforms the current dance suffices to inform the I/O system of how large a file ZEO needs (note that just seeking to the max size does not suffice -- a byte needs to be written at the end to set the EOF pointer). On NTFS that's almost reliable (see above); on other systems it may or may not be. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
I'ev not had anything but bad experiences with pound myself, lvs seems a much more preferable alternative... We have not had such negative experiences with pound. >data = self.socket.recv(buffer_size) > error: (113, 'No route to host') That *is* very odd, anything other than pound being used for load balancing or traffic shaping? This has to be a major problem maker in the system. Pound is simply round robin connections to pool of Zope. > File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec > v >data = self.socket.recv(buffer_size) > error: (110, 'Connection timed out') Storage server too busy :-( the storage server can be "too busy" when its only reading? cheers alan ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Alan Runyan wrote: We have 10 ZEO clients that are for public consumption "READ ONLY". We have a separate ZEO client that is writing that is on a separate box. I'd put money on the client doing the writing causing problems. That or client side cache thrash caused by zcatalog or similar ;-) The ZEO Server was consistently at 2% CPU. What was the network and disk i/o like? getting through the cache and back to pound which was then load I'ev not had anything but bad experiences with pound myself, lvs seems a much more preferable alternative... The customer was posting content throughout the slashdot. The problem was that when the clients would update they would end up through 500s not sure what that last bit means... data = self.socket.recv(buffer_size) error: (113, 'No route to host') That *is* very odd, anything other than pound being used for load balancing or traffic shaping? File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') Storage server too busy :-( cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
On Apr 10, 2007, at 2:19 PM, Alan Runyan wrote: ... For Jim: We did not adjust the transaction timeout. Would that have helped in the case of READ's? Possibly, I'm not sure and I don't have time now to dig. It might be worth trying, however: The customer was posting content throughout the slashdot. The problem was that when the clients would update they would end up through 500s I am seeing something *very* strange in zeo.log: 2007-04-10T12:20:53 ERROR ZEO.zrpc.Connection(S) (172.16.235.120:49351) Erro r caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han dle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re ad d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (113, 'No route to host') but several more of these: 2007-04-10T13:55:36 ERROR ZEO.zrpc.Connection(S) (172.16.235.119:44322) Erro r caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han dle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re ad d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') -- Obviously, both of these are very suspicious. Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Alan Runyan wrote: The website got slashdoted this morning [...] Just FYI: Varnish didnt go over 3% CPU during the traffic surge; over 200 req/second. Off topic: 200 requests a second seems a bit light for a slashdotting, any more details you can divulge there? -- Benji York Senior Software Engineer Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZEO client cache tempfile oddness
Paul Winkler wrote at 2007-4-6 13:30 -0400: > ... >If I understand this stuff correctly, the code in question on a >filesystem that *doesn't* have the sparse file optimization would >equate to "write N null bytes to this file as fast as possible." >True? Posix defines the semantics. I have not looked it up, but a possible interpretation would also be: write the "n.th" byte and let all other bytes undefined. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
How does data get into the ZEO storage then? We have 10 ZEO clients that are for public consumption "READ ONLY". We have a separate ZEO client that is writing that is on a separate box. The website got slashdoted this morning and we had 4 zeo clients go out. Basically waiting for the zeo server for many minutes (basically hung and reporting 500s back to browsers). 4/10/07 - delta ah2 lock up - http://paste.plone.org/13919 4/10/07 - epsilon ah2 lock up - http://paste.plone.org/13920 4/10/07 - delta ah1 lock up - http://paste.plone.org/13936 4/10/07 - epsilon ah3 lock up - http://paste.plone.org/13935 Just FYI: Varnish didnt go over 3% CPU during the traffic surge; over 200 req/second. The ZEO Server was consistently at 2% CPU. Lots of traffic was getting through the cache and back to pound which was then load balancing to the ZEO clients. For Jim: We did not adjust the transaction timeout. Would that have helped in the case of READ's? The customer was posting content throughout the slashdot. The problem was that when the clients would update they would end up through 500s I am seeing something *very* strange in zeo.log: 2007-04-10T12:20:53 ERROR ZEO.zrpc.Connection(S) (172.16.235.120:49351) Erro r caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han dle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re ad d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (113, 'No route to host') but several more of these: 2007-04-10T13:55:36 ERROR ZEO.zrpc.Connection(S) (172.16.235.119:44322) Erro r caught in asyncore Traceback (most recent call last): File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 69, in read obj.handle_read_event() File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 391, in han dle_read_event self.handle_read() File "/usr/local/zope/lib/python/ZEO/zrpc/smac.py", line 147, in handle_re ad d = self.recv(8192) File "/usr/local/python-2.4.4/lib/python2.4/asyncore.py", line 343, in rec v data = self.socket.recv(buffer_size) error: (110, 'Connection timed out') -- ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] more lockup information / zope2.9.6+zodb 3.6.2
Alan Runyan wrote: Do you have anything that is committing very large transactions? No. In fact; these clients could be running in read only mode. As far as I'm concerned. How does data get into the ZEO storage then? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev