Re: [ZODB-Dev] shared cache when no write?
On 14/12/2012, at 8:32 AM, Jim Fulton j...@zope.com wrote: On Thu, Dec 13, 2012 at 4:18 PM, Dylan Jay d...@pretaweb.com wrote: ... I'd never considered that the cache was attached to the db connection rather than the thread. I just reread http://docs.zope.org/zope2/zope2book/MaintainingZope.html and it says exactly that. So what your saying is I'd tune db connections down to memory size on an instance dedicated to io bound and then increase the threads. Whenever a thread requests a db connection and there isn't one available it will block. So I just optimize my app the release the db connection when not needed. In fact I could tune all my copes this way since a zone with 10 threads and 2 connections is going to end up queuing requests the same as 2 threads and 10 connections? Something like that. It's a little more complicated than that because Zope 2 is managing connections for you, it would be easy to run afoul of that. This is a case where something that usually makes your life easier, makes it harder. :) true. With Plone as you have many modules sharing the connection all expecting it to be the same connection closing the connection half way through isn't possible. If it was closed and another connection opened then the other modules that are outside of your control might have references to stale data. What I'd do is use a separate database other than the one Zope 2 is using. Then you can manage connections yourself without conflicting with the publisher is doing. Then, when you want to use the database, you just open the database, being careful to close it when you're going to block. The downside being that you'll have separate transactions. This should be easier to achieve and changes the application less than the erp5 background task solution mentioned. It would probably be a good idea to lean more bout how erp does this. The erp approach sounds like a variation on what I suggested. It's not always possible as sometimes you need to feedback the result to the user immediately. Let's take another example. A Plone site with a page that lets you upload a mp3 file and it guesses the song, then combines that with your preference data to return other songs you might like. The guessing the song bit is an external service and the preference data is stored in the same zodb as Plone. To do it the ERP background task way you;d deliver back a page with some javascript on it that polls the server to see if the song had been processed yet. This isn't always desirable, esp if you have to avoid javascript. Maybe another possibility is to do it the way ZODB handles streaming blobs. The blob streaming happens after the db connection is closed. Perhaps if there was a way to register a callback in zope for processing to happen after the db connection is closed but before the request is returned. At this point, I could do a external connection and combine the resulting data to modify the response object, perhaps in an async thread like blobs uses. If I really wanted to write or read more data I could request a new thread and db connection at that point. I can see from the previous post, as there is no checkout semantics in zodb, I don't know what checkout semantics means. As in the ZODB protocol doesn't have a call you have to make before you write to an object. You just write to the object and afterwards flag as changed (if needed). So there isn't a way to block at the point of writing. Malthe's database had an explicit checkout action so you weren't allowed to mutate anything until you checked it out presumably. Not something you can introduce into ZODB. you are free to write anytime so there is no sane way to block at the point someone wants to write to an object, so it wouldn't work. ZODB provides a very simple concurrency model by giving each connection (and in common practice, each thread) it's own view of the database. If you break that, then you're injecting concurrency issues into the app or in some pretty magical layer. You perhaps could have a single read only db connection which is shared? But even if the database data was only read, objects have other state that may be mutated. You'd have to inspect every class to make sure it's thread safe. That's too scary for me. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton Jerky is better than bacon! http://zo.pe/Kqm ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] shared cache when no write?
Hi, On Mon, Dec 17, 2012 at 10:03 PM, Dylan Jay d...@pretaweb.com wrote: On 14/12/2012, at 8:32 AM, Jim Fulton j...@zope.com wrote: On Thu, Dec 13, 2012 at 4:18 PM, Dylan Jay d...@pretaweb.com wrote: ... I'd never considered that the cache was attached to the db connection rather than the thread. I just reread http://docs.zope.org/zope2/zope2book/MaintainingZope.html and it says exactly that. So what your saying is I'd tune db connections down to memory size on an instance dedicated to io bound and then increase the threads. Whenever a thread requests a db connection and there isn't one available it will block. So I just optimize my app the release the db connection when not needed. In fact I could tune all my copes this way since a zone with 10 threads and 2 connections is going to end up queuing requests the same as 2 threads and 10 connections? Something like that. It's a little more complicated than that because Zope 2 is managing connections for you, it would be easy to run afoul of that. This is a case where something that usually makes your life easier, makes it harder. :) true. With Plone as you have many modules sharing the connection all expecting it to be the same connection closing the connection half way through isn't possible. If it was closed and another connection opened then the other modules that are outside of your control might have references to stale data. What I'd do is use a separate database other than the one Zope 2 is using. Then you can manage connections yourself without conflicting with the publisher is doing. Then, when you want to use the database, you just open the database, being careful to close it when you're going to block. The downside being that you'll have separate transactions. This should be easier to achieve and changes the application less than the erp5 background task solution mentioned. It would probably be a good idea to lean more bout how erp does this. The erp approach sounds like a variation on what I suggested. Indeed, it's clear from all the proposed solutions (including DJ's reconnect after transaction end but before returning to the user) that you can't have, at the same time, a single ZODB transaction AND immediate user feedback, when depending on an external system. There's not much more to the ERP5 technique than what I already explained earlier. It boils down to: * take user input * store it as received with as little processing as possible * trigger background activities (as few as possible) for anything that requires looking beyond the object the user is currently manipulating and it's immediate vicinity (specially object reindexing). * return info to the user as fast as possible, including any info telling him to check back later if necessary. It's not always possible as sometimes you need to feedback the result to the user immediately. Let's take another example. A Plone site with a page that lets you upload a mp3 file and it guesses the song, then combines that with your preference data to return other songs you might like. The guessing the song bit is an external service and the preference data is stored in the same zodb as Plone. To do it the ERP background task way you;d deliver back a page with some javascript on it that polls the server to see if the song had been processed yet. This isn't always desirable, esp if you have to avoid javascript. Avoiding JavaScript is possible with the same approach GitHub does when forking a repo: a meta-http-equiv-refresh message we're processing your request. This page will update itself when we're done. You may refresh if it on your own if it makes you feel like you're in control. Providing user feedback is usually less tricky than coping with system restrictions. As long as the user is seeing something happening, and the system feels like it's evolving towards a solution, instead of seeming stuck, users tend to be satisfied. In your example, the user already waits quite a bit for his file upload to finish. Having him wait on the external system to handle the date could be a bit too much, better return some info to him and show the rest later. [...] Cheers, Leo ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] shared cache when no write?
On 18/12/2012, at 2:15 PM, Leonardo Rochael Almeida leoroch...@gmail.com wrote: Hi, On Mon, Dec 17, 2012 at 10:03 PM, Dylan Jay d...@pretaweb.com wrote: On 14/12/2012, at 8:32 AM, Jim Fulton j...@zope.com wrote: On Thu, Dec 13, 2012 at 4:18 PM, Dylan Jay d...@pretaweb.com wrote: ... I'd never considered that the cache was attached to the db connection rather than the thread. I just reread http://docs.zope.org/zope2/zope2book/MaintainingZope.html and it says exactly that. So what your saying is I'd tune db connections down to memory size on an instance dedicated to io bound and then increase the threads. Whenever a thread requests a db connection and there isn't one available it will block. So I just optimize my app the release the db connection when not needed. In fact I could tune all my copes this way since a zone with 10 threads and 2 connections is going to end up queuing requests the same as 2 threads and 10 connections? Something like that. It's a little more complicated than that because Zope 2 is managing connections for you, it would be easy to run afoul of that. This is a case where something that usually makes your life easier, makes it harder. :) true. With Plone as you have many modules sharing the connection all expecting it to be the same connection closing the connection half way through isn't possible. If it was closed and another connection opened then the other modules that are outside of your control might have references to stale data. What I'd do is use a separate database other than the one Zope 2 is using. Then you can manage connections yourself without conflicting with the publisher is doing. Then, when you want to use the database, you just open the database, being careful to close it when you're going to block. The downside being that you'll have separate transactions. This should be easier to achieve and changes the application less than the erp5 background task solution mentioned. It would probably be a good idea to lean more bout how erp does this. The erp approach sounds like a variation on what I suggested. Indeed, it's clear from all the proposed solutions (including DJ's reconnect after transaction end but before returning to the user) that you can't have, at the same time, a single ZODB transaction AND immediate user feedback, when depending on an external system. There's not much more to the ERP5 technique than what I already explained earlier. It boils down to: * take user input * store it as received with as little processing as possible * trigger background activities (as few as possible) for anything that requires looking beyond the object the user is currently manipulating and it's immediate vicinity (specially object reindexing). * return info to the user as fast as possible, including any info telling him to check back later if necessary. It's not always possible as sometimes you need to feedback the result to the user immediately. Let's take another example. A Plone site with a page that lets you upload a mp3 file and it guesses the song, then combines that with your preference data to return other songs you might like. The guessing the song bit is an external service and the preference data is stored in the same zodb as Plone. To do it the ERP background task way you;d deliver back a page with some javascript on it that polls the server to see if the song had been processed yet. This isn't always desirable, esp if you have to avoid javascript. Avoiding JavaScript is possible with the same approach GitHub does when forking a repo: a meta-http-equiv-refresh message we're processing your request. This page will update itself when we're done. You may refresh if it on your own if it makes you feel like you're in control. Providing user feedback is usually less tricky than coping with system restrictions. As long as the user is seeing something happening, and the system feels like it's evolving towards a solution, instead of seeming stuck, users tend to be satisfied. In your example, the user already waits quite a bit for his file upload to finish. Having him wait on the external system to handle the date could be a bit too much, better return some info to him and show the rest later. true you could do it that way for certain types of requests. The real life situation I was involved with had a backend response time of between 1-3 seconds. Long enough to cause scalability issues on the server by running out of connections but not too long that the customers were prepared to have a UI that autorefreshed or used ajax, esp since plenty of other technologies don't have this limitation (or has Jim pointed out, they do have this limitation but it isn't as bad). Also if you are proxing another external application, then it would be a lot of work to rework to make each page asynchronous. [...] Cheers, Leo
Re: [ZODB-Dev] shared cache when no write?
On Wed, Dec 12, 2012 at 6:31 PM, Dylan Jay d...@pretaweb.com wrote: Hi, I've been working with zope for over 12 years and something that keeps coming up is sacling IO bound operations in Zope. The typical example is where you build an app that calls external apis. While this is happening a zope thread isn't doing any other processing and because there is a 1 thread 1 zodb cache limit. You can run into scalability problems as you can only have as many threads your RAM / average cache size. The end result is low throughput while still having low CPU. I've consulted on some $$$ sites where others have made this mistake. It's an easy mistake to make as SQL/PHP systems don't tend to have this limitation so new developers to zope often don't to think of it. I was listening to a talk by a Java guy on Friday where he warned that a common newbie mistake was to have too large a database connection pool, causing lots of RAM usage. I expect though that ZODB caches, consisting of live Python objects exacerbate this effect. The possible workarounds aren't pretty. You can segregate your api calling requests to zeo clients with large numbers of threads with small caches using some fancy load balancing rules. You can rework that part of your application to not use zope, perhaps using edge side includes to make it seem p art of the same app. Feel free to shoot down the following makes no sense. What if two or more threads could share a zodb cache up until the point at which one wants to write. This is the point at which you can't share a cache in a consistent manner in my understanding. At that point the transaction could be blocked until other readonly transactions had finished and continue by itself? or perhaps the write transaction could be aborted and restarted with a special flag to ensure it was processed with the cache to itself. As long as requests which involve external access are readonly with regard to zope then this would improve throughput. This might seem an edge case but consider where you want to integrate an external app into a zope or Plone app. Often the external api is doing the writing not the zope part. For example clicking a button on a plone site to make plone send a tweet. It might also improve throughput on zope requests which involve zodb cache misses as they are also IO bound. A simpler approach might be to manage connections better at the application level so you don't need so many of them. If you're goinng to spend a lot of time blocked waiting on some external service, why not close the database connection and reopen it when you need it? Then you could have a lot more threads than database connections. It's possible that ZODB could help at the savepoint level. For example, maybe you could somehow allow savepoints to be used accross tranasctions and connections. This would be a lot saner that tring to share a cache accross threads. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton Jerky is better than bacon! http://zo.pe/Kqm ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] shared cache when no write?
Hi In ERP5, the rule is that you should never talk to external systems as a synchronous response to a user request, and you should avoid, at all costs, writing to ZODB at the same time as talking to external systems (or the external system must be able to handle this gracefully). Of course, it helps a lot if you have a reliable background task mechanism. In the example given, the click to send the tweet would cause an app written the ERP5 way to simply register all the information necessary to send the tweet into ZODB, trigger a background activity to send the tweet later, and immediately return to the user (browser). Then, one of the nodes dedicated to executing background activities pick up the tweet sending activity and can spend as much time as necessary in the venture, and finally, trigger another background activity to actually write to the ZODB that the sending was successful. This last part is to avoid sending the tweet twice on account of a conflict-error. You can do something similar with zc.async and friends (like plone.app.async), but make sure you got your configuration right so that conflict errors are resolved automatically in the storage of background activities (ERP5 uses MySQL for storage and coordination of background activities, so no fear of conflict errors there). Cheers, Leo On Thu, Dec 13, 2012 at 10:07 AM, Jim Fulton j...@zope.com wrote: On Wed, Dec 12, 2012 at 6:31 PM, Dylan Jay d...@pretaweb.com wrote: Hi, I've been working with zope for over 12 years and something that keeps coming up is sacling IO bound operations in Zope. The typical example is where you build an app that calls external apis. While this is happening a zope thread isn't doing any other processing and because there is a 1 thread 1 zodb cache limit. You can run into scalability problems as you can only have as many threads your RAM / average cache size. The end result is low throughput while still having low CPU. I've consulted on some $$$ sites where others have made this mistake. It's an easy mistake to make as SQL/PHP systems don't tend to have this limitation so new developers to zope often don't to think of it. I was listening to a talk by a Java guy on Friday where he warned that a common newbie mistake was to have too large a database connection pool, causing lots of RAM usage. I expect though that ZODB caches, consisting of live Python objects exacerbate this effect. The possible workarounds aren't pretty. You can segregate your api calling requests to zeo clients with large numbers of threads with small caches using some fancy load balancing rules. You can rework that part of your application to not use zope, perhaps using edge side includes to make it seem p art of the same app. Feel free to shoot down the following makes no sense. What if two or more threads could share a zodb cache up until the point at which one wants to write. This is the point at which you can't share a cache in a consistent manner in my understanding. At that point the transaction could be blocked until other readonly transactions had finished and continue by itself? or perhaps the write transaction could be aborted and restarted with a special flag to ensure it was processed with the cache to itself. As long as requests which involve external access are readonly with regard to zope then this would improve throughput. This might seem an edge case but consider where you want to integrate an external app into a zope or Plone app. Often the external api is doing the writing not the zope part. For example clicking a button on a plone site to make plone send a tweet. It might also improve throughput on zope requests which involve zodb cache misses as they are also IO bound. A simpler approach might be to manage connections better at the application level so you don't need so many of them. If you're goinng to spend a lot of time blocked waiting on some external service, why not close the database connection and reopen it when you need it? Then you could have a lot more threads than database connections. It's possible that ZODB could help at the savepoint level. For example, maybe you could somehow allow savepoints to be used accross tranasctions and connections. This would be a lot saner that tring to share a cache accross threads. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton Jerky is better than bacon! http://zo.pe/Kqm ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] shared cache when no write?
On Thu, Dec 13, 2012 at 12:11 PM, Leonardo Rochael Almeida leoroch...@gmail.com wrote: (or the external system must be able to handle this gracefully). By this I meant conflict errors, for example SMTP servers posting INTO Zope can retry sending later in case of error. ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] shared cache when no write?
On 13/12/2012, at 11:07 PM, Jim Fulton j...@zope.com wrote: On Wed, Dec 12, 2012 at 6:31 PM, Dylan Jay d...@pretaweb.com wrote: Hi, I've been working with zope for over 12 years and something that keeps coming up is sacling IO bound operations in Zope. The typical example is where you build an app that calls external apis. While this is happening a zope thread isn't doing any other processing and because there is a 1 thread 1 zodb cache limit. You can run into scalability problems as you can only have as many threads your RAM / average cache size. The end result is low throughput while still having low CPU. I've consulted on some $$$ sites where others have made this mistake. It's an easy mistake to make as SQL/PHP systems don't tend to have this limitation so new developers to zope often don't to think of it. I was listening to a talk by a Java guy on Friday where he warned that a common newbie mistake was to have too large a database connection pool, causing lots of RAM usage. I expect though that ZODB caches, consisting of live Python objects exacerbate this effect. The possible workarounds aren't pretty. You can segregate your api calling requests to zeo clients with large numbers of threads with small caches using some fancy load balancing rules. You can rework that part of your application to not use zope, perhaps using edge side includes to make it seem p art of the same app. Feel free to shoot down the following makes no sense. What if two or more threads could share a zodb cache up until the point at which one wants to write. This is the point at which you can't share a cache in a consistent manner in my understanding. At that point the transaction could be blocked until other readonly transactions had finished and continue by itself? or perhaps the write transaction could be aborted and restarted with a special flag to ensure it was processed with the cache to itself. As long as requests which involve external access are readonly with regard to zope then this would improve throughput. This might seem an edge case but consider where you want to integrate an external app into a zope or Plone app. Often the external api is doing the writing not the zope part. For example clicking a button on a plone site to make plone send a tweet. It might also improve throughput on zope requests which involve zodb cache misses as they are also IO bound. A simpler approach might be to manage connections better at the application level so you don't need so many of them. If you're goinng to spend a lot of time blocked waiting on some external service, why not close the database connection and reopen it when you need it? Then you could have a lot more threads than database connections. I'd never considered that the cache was attached to the db connection rather than the thread. I just reread http://docs.zope.org/zope2/zope2book/MaintainingZope.html and it says exactly that. So what your saying is I'd tune db connections down to memory size on an instance dedicated to io bound and then increase the threads. Whenever a thread requests a db connection and there isn't one available it will block. So I just optimize my app the release the db connection when not needed. In fact I could tune all my copes this way since a zone with 10 threads and 2 connections is going to end up queuing requests the same as 2 threads and 10 connections? This should be easier to achieve and changes the application less than the erp5 background task solution mentioned. It's possible that ZODB could help at the savepoint level. For example, maybe you could somehow allow savepoints to be used accross tranasctions and connections. This would be a lot saner that tring to share a cache accross threads. I can see from the previous post, as there is no checkout semantics in zodb, you are free to write anytime so there is no sane way to block at the point someone wants to write to an object, so it wouldn't work. You perhaps could have a single read only db connection which is shared? So in the case above during io bound operations or if you knew you never want to write, you could close the normal connection and open a read only one. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton Jerky is better than bacon! http://zo.pe/Kqm ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] shared cache when no write?
On Thu, Dec 13, 2012 at 4:18 PM, Dylan Jay d...@pretaweb.com wrote: ... I'd never considered that the cache was attached to the db connection rather than the thread. I just reread http://docs.zope.org/zope2/zope2book/MaintainingZope.html and it says exactly that. So what your saying is I'd tune db connections down to memory size on an instance dedicated to io bound and then increase the threads. Whenever a thread requests a db connection and there isn't one available it will block. So I just optimize my app the release the db connection when not needed. In fact I could tune all my copes this way since a zone with 10 threads and 2 connections is going to end up queuing requests the same as 2 threads and 10 connections? Something like that. It's a little more complicated than that because Zope 2 is managing connections for you, it would be easy to run afoul of that. This is a case where something that usually makes your life easier, makes it harder. :) What I'd do is use a separate database other than the one Zope 2 is using. Then you can manage connections yourself without conflicting with the publisher is doing. Then, when you want to use the database, you just open the database, being careful to close it when you're going to block. The downside being that you'll have separate transactions. This should be easier to achieve and changes the application less than the erp5 background task solution mentioned. It would probably be a good idea to lean more bout how erp does this. The erp approach sounds like a variation on what I suggested. I can see from the previous post, as there is no checkout semantics in zodb, I don't know what checkout semantics means. you are free to write anytime so there is no sane way to block at the point someone wants to write to an object, so it wouldn't work. ZODB provides a very simple concurrency model by giving each connection (and in common practice, each thread) it's own view of the database. If you break that, then you're injecting concurrency issues into the app or in some pretty magical layer. You perhaps could have a single read only db connection which is shared? But even if the database data was only read, objects have other state that may be mutated. You'd have to inspect every class to make sure it's thread safe. That's too scary for me. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton Jerky is better than bacon! http://zo.pe/Kqm ___ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev