Re: [ZODB-Dev] [OT] NoSQL
On Sun, 2009-11-15 at 00:31 -0700, Shane Hathaway wrote: Roché Compaan wrote: On Sat, 2009-11-14 at 14:23 -0700, Shane Hathaway wrote: I think proper construction of horizontally scalable databases must be done partly at application level, since a lot of the issues to be solved are specific to the application. What are the issues you're talking about? Every database system has almost countless issues to balance, such as durability, consistency, performance, freshness, availability, etc. The demands of horizontal scaling make the issues too complex to completely delegate to a database layer. These concerns don't disappear when implementing a solution to big databases at application level. In my experience it becomes even more complex at application level and you have to do an inordinate amount of configuration to manage partitions. With the z3c.sharding implementation you would have to configure multiple containers. I can't see why it wouldn't be possible to develop a ZODB storage similar to hypertable http://code.google.com/p/hypertable/wiki/ArchitecturalOverview -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
On 11/13/09 21:33 , Shane Hathaway wrote: I've been studying how to build an enormous database based on what I know. There are an incredible number of distributed databases these days, but all of them concern me in one way or another. Can you share some of those concerns with us? I'ld be interested to hear what kind of problems you see. Wichert. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
Roché Compaan wrote: On Fri, 2009-11-13 at 13:33 -0700, Shane Hathaway wrote: Stephan Richter wrote: http://svn.zope.org/z3c.sharding/trunk Great stuff! This approaches scaling a large data set at application level though. Don't you think a ZODB storage doing this for you would solve the problem more generally? I think proper construction of horizontally scalable databases must be done partly at application level, since a lot of the issues to be solved are specific to the application. I think that the master index needs to be partitioned as well. In benchmarks I performed early last year (http://bit.ly/pSVmd), a BTree could only handle about 250 inserts / second when it approached 10 million objects, so I'm guessing it will be almost unusable at a 100 million. Right. The top level probably ought to be a dynamic hash, not a BTree. I intended z3c.sharding more as a proof of concept. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
Wichert Akkerman wrote: On 11/13/09 21:33 , Shane Hathaway wrote: I've been studying how to build an enormous database based on what I know. There are an incredible number of distributed databases these days, but all of them concern me in one way or another. Can you share some of those concerns with us? I'ld be interested to hear what kind of problems you see. The best article I've found is a simple presentation and overview: http://highscalability.com/blog/2009/11/5/a-yes-for-a-nosql-taxonomy.html The article neatly categorizes a lot of the NoSQL databases. It suggests that document stores have the right level of complexity. Wide columnar stores like Cassandra could be too complicated to gain a lot of traction, while simpler databases might lack features we would normally take for granted. In other articles, I learned about CouchDB conflict resolution. CouchDB allows any conflict and it stores both conflicting values, expecting the application to resolve the conflict later. Clearly, CouchDB is designed to solve the PDA use case: I change a contact's phone number differently on my PDA and my desktop, then when I sync, I click some UI button to indicate which is correct. I think that sort of conflict resolution would cause security holes for the application I am working on, but it would probably work for a lot of other applications. Current versions of CouchDB expect applications to scale using replication. Replication is not a substitute for sharding. The couchdb-lounge project seems to be solving that with proxies: http://tilgovi.github.com/couchdb-lounge/ The Mongo DB guys have a pretty thorough and fair comparison of CouchDB and Mongo: http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB The bottom of the page lists use cases for MongoDB. It says people building a system with very critical transactions should choose a traditional RDBMS. That seems like reasonable advice for the application I'm building, except that I consider ZODB to be at least as reliable as an RDBMS. (RelStorage uses a subset of RDBMS functionality that I have found to be reliable.) I think that by very critical, the MongoDB authors are referring to applications that must not allow conflicting updates. Conflict resolution is probably my main concern with all of these new databases. I have no doubts about ZODB's conflict resolution policy, while I can imagine a variety of different policies these other databases might implement. A four or five dimensional hash like Cassandra might even have a conflict resolution policy that changes with every release. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
On Sat, 2009-11-14 at 14:23 -0700, Shane Hathaway wrote: Roché Compaan wrote: On Fri, 2009-11-13 at 13:33 -0700, Shane Hathaway wrote: Stephan Richter wrote: http://svn.zope.org/z3c.sharding/trunk Great stuff! This approaches scaling a large data set at application level though. Don't you think a ZODB storage doing this for you would solve the problem more generally? I think proper construction of horizontally scalable databases must be done partly at application level, since a lot of the issues to be solved are specific to the application. What are the issues you're talking about? I think that the master index needs to be partitioned as well. In benchmarks I performed early last year (http://bit.ly/pSVmd), a BTree could only handle about 250 inserts / second when it approached 10 million objects, so I'm guessing it will be almost unusable at a 100 million. Right. The top level probably ought to be a dynamic hash, not a BTree. I intended z3c.sharding more as a proof of concept. Sure, just thought I'd mention it. Also keep in mind that the dynamic hash needs to handle the introduction of new partitions. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
Roché Compaan wrote: On Sat, 2009-11-14 at 14:23 -0700, Shane Hathaway wrote: I think proper construction of horizontally scalable databases must be done partly at application level, since a lot of the issues to be solved are specific to the application. What are the issues you're talking about? Every database system has almost countless issues to balance, such as durability, consistency, performance, freshness, availability, etc. The demands of horizontal scaling make the issues too complex to completely delegate to a database layer. Also keep in mind that the dynamic hash needs to handle the introduction of new partitions. Certainly. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
Am 14.11.09 23:33, schrieb Shane Hathaway: I think that by very critical, the MongoDB authors are referring to applications that must not allow conflicting updates. Conflict resolution is probably my main concern with all of these new databases. I have no doubts about ZODB's conflict resolution policy, while I can imagine a variety of different policies these other databases might implement. A four or five dimensional hash like Cassandra might even have a conflict resolution policy that changes with every release. My high-level comment: choose the right tool for each individual problem. We have been building hybrid applications on top of Zope using the ZODB and a RDBMS for years. I can imagine building a Web-2.0-ish application on top of a RDBMS for storing personal data (where transaction integration is a must) and using something like MongoDB for mass-data. You have to analyze which data are important and which are less important and then choose the related backend. To MongoDB: I made lots of tests with MongoDB lately and found it pretty amazing, fast and reliable. Especially the replication support looks good and the sharding functionality (although still alpha or beta) appears promising. But the speed has its price: only atomicity for single document entities. Andreas Andreas Jung www.zopyx.com i...@zopyx.com mailto:i...@zopyx.com CEO ZOPYX Ltd. Co. KG attachment: lists.vcf___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
Hello, I think we can look at this at 2 levels. 1.: As your app uses ZODB. Then this is your app's problem/reponsibility. You use a nosql contender directly from your app and it's your responsibility to deal with it. 2.: On the ZODB Storage level. So far I can see that level needs consistency, transactions and locking support. Those are usually missing from nosql implementations (unless I miss some). OTOH a key-value store would fit the ZODB storage. If someone finds/writes a key-value storage that has the above properties we could give it a try. Thursday, November 12, 2009, 8:24:43 PM, you wrote: SH Encolpe Degoute wrote: Is there someone in the ZODB development team following this: http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosystem/ SH It is possible that ZODB unfortunately occupies the same space as SQL in SH the CAP triangle: SH http://camelcase.blogspot.com/2007/08/cap-theorem.html SH That is to say, ZODB applications require consistency and availability, SH so if the CAP theorem is true, then ZODB applications can not be very SH partition-tolerant. SH The NoSQL databases provide availability and partition tolerance while SH foregoing absolute consistency. SH Shane SH ___ SH For more information about ZODB, see the ZODB Wiki: SH http://www.zope.org/Wikis/ZODB/ SH ZODB-Dev mailing list - ZODB-Dev@zope.org SH https://mail.zope.org/mailman/listinfo/zodb-dev -- Best regards, Adam GROSZERmailto:agros...@gmail.com -- Quote of the day: For a good time, call 836-3100. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
On Fri, 2009-11-13 at 10:58 +0100, Christian Theune wrote: On 11/13/2009 10:42 AM, Adam GROSZER wrote: Hello, I think we can look at this at 2 levels. 1.: As your app uses ZODB. Then this is your app's problem/reponsibility. You use a nosql contender directly from your app and it's your responsibility to deal with it. 2.: On the ZODB Storage level. So far I can see that level needs consistency, transactions and locking support. Those are usually missing from nosql implementations (unless I miss some). OTOH a key-value store would fit the ZODB storage. If someone finds/writes a key-value storage that has the above properties we could give it a try. Looking at the article referenced by Shane I understand that we'd have to drop consistency for being able to use such a store -- I feel that's not a good idea in the face of ZODB. The applications we write are intended to run consistently without having application-level (or even user-based) reconciliation work to do if a transaction came through. I think there is something more important to notice than dropping of consistency, and that is scaling easily to handle very large datasets. Having to implement a data partitioning strategy at application level is very difficult to get right. Having a ZODB storage that is distributed across machines can become a big selling point for the ZODB and would make it very convenient for us who do sometimes have the rare opportunity to write applications that expect data sets in excess of 100 million records. We had such an opportunity about 2 years ago and although the client never reached (and probably will never) reach the membership they dreamed about, they did pay us to develop a storage for members that could scale to more than a 100 million members. We implemented a data partitioning strategy at application level. If I had another shot at it, I would try and develop a distributed ZODB storage, because it would be a lot simpler compared to what we had to do at application level. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
On Friday 13 November 2009, Roché Compaan wrote: We had such an opportunity about 2 years ago and although the client never reached (and probably will never) reach the membership they dreamed about, they did pay us to develop a storage for members that could scale to more than a 100 million members. We implemented a data partitioning strategy at application level. If I had another shot at it, I would try and develop a distributed ZODB storage, because it would be a lot simpler compared to what we had to do at application level. Note that Shane developed a sharding solution a year ago with me. It provides container-level partitioning. http://svn.zope.org/z3c.sharding/trunk This in combination with the encryption work that we did for the ZODB makes the ZODB actually be a lot more advanced than some of the new comers. I am very intrigued now to setup an EC2 cluster and install a z3c.sharding based solution demonstrating 100M users with some data. Mmmh... Regards, Stephan -- Entrepreneur and Software Geek Google me. Zope Stephan Richter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
Stephan Richter wrote: On Friday 13 November 2009, Roché Compaan wrote: We had such an opportunity about 2 years ago and although the client never reached (and probably will never) reach the membership they dreamed about, they did pay us to develop a storage for members that could scale to more than a 100 million members. We implemented a data partitioning strategy at application level. If I had another shot at it, I would try and develop a distributed ZODB storage, because it would be a lot simpler compared to what we had to do at application level. Note that Shane developed a sharding solution a year ago with me. It provides container-level partitioning. http://svn.zope.org/z3c.sharding/trunk Thanks for the reminder. :-) This in combination with the encryption work that we did for the ZODB makes the ZODB actually be a lot more advanced than some of the new comers. I am very intrigued now to setup an EC2 cluster and install a z3c.sharding based solution demonstrating 100M users with some data. Mmmh... I've been studying how to build an enormous database based on what I know. There are an incredible number of distributed databases these days, but all of them concern me in one way or another. I'm wondering if ZODB might actually have a fighting chance in the distributed database realm. With z3c.sharding or something like it, I think I would set things up as follows: - In-memory ZODB caches would probably be pointlessly painful at that scale, so I would set the ZODB cache size for all partitions to 0. A cache size of 0 allows ZODB to cache for the duration of a request, but flushes all objects out of the cache at transaction boundaries. - With the cache size set to 0, we can disable cache invalidation, which will probably be a major win. - I would rely heavily on memcached to provide the pickles. I would try to use the cache checkpointing algorithm I recently added to RelStorage. - I would aim to read or write only a small number of objects per request from partitions. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
I am very intrigued now to setup an EC2 cluster and install a z3c.sharding based solution demonstrating 100M users with some data. Mmmh... That is the great thing about EC2. You can do massive experiments on the cheap. Actually one of our interns is doing some work on ZODB. He is doing mostly narrow calculations on efficiency of ZODB. i.e. how structure of ZODB Filestorage could be changed to better use disk cache, etc. Possibly interesting to the community at large. cheers alan ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
On Fri, 2009-11-13 at 13:33 -0700, Shane Hathaway wrote: Stephan Richter wrote: On Friday 13 November 2009, Roché Compaan wrote: We had such an opportunity about 2 years ago and although the client never reached (and probably will never) reach the membership they dreamed about, they did pay us to develop a storage for members that could scale to more than a 100 million members. We implemented a data partitioning strategy at application level. If I had another shot at it, I would try and develop a distributed ZODB storage, because it would be a lot simpler compared to what we had to do at application level. Note that Shane developed a sharding solution a year ago with me. It provides container-level partitioning. http://svn.zope.org/z3c.sharding/trunk Great stuff! This approaches scaling a large data set at application level though. Don't you think a ZODB storage doing this for you would solve the problem more generally? Thanks for the reminder. :-) This in combination with the encryption work that we did for the ZODB makes the ZODB actually be a lot more advanced than some of the new comers. I am very intrigued now to setup an EC2 cluster and install a z3c.sharding based solution demonstrating 100M users with some data. Mmmh... I've been studying how to build an enormous database based on what I know. There are an incredible number of distributed databases these days, but all of them concern me in one way or another. I'm wondering if ZODB might actually have a fighting chance in the distributed database realm. With z3c.sharding or something like it, I think I would set things up as follows: - In-memory ZODB caches would probably be pointlessly painful at that scale, so I would set the ZODB cache size for all partitions to 0. A cache size of 0 allows ZODB to cache for the duration of a request, but flushes all objects out of the cache at transaction boundaries. - With the cache size set to 0, we can disable cache invalidation, which will probably be a major win. - I would rely heavily on memcached to provide the pickles. I would try to use the cache checkpointing algorithm I recently added to RelStorage. - I would aim to read or write only a small number of objects per request from partitions. I think that the master index needs to be partitioned as well. In benchmarks I performed early last year (http://bit.ly/pSVmd), a BTree could only handle about 250 inserts / second when it approached 10 million objects, so I'm guessing it will be almost unusable at a 100 million. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] [OT] NoSQL
Encolpe Degoute wrote: Is there someone in the ZODB development team following this: http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosystem/ It is possible that ZODB unfortunately occupies the same space as SQL in the CAP triangle: http://camelcase.blogspot.com/2007/08/cap-theorem.html That is to say, ZODB applications require consistency and availability, so if the CAP theorem is true, then ZODB applications can not be very partition-tolerant. The NoSQL databases provide availability and partition tolerance while foregoing absolute consistency. Shane ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev