Re: [Zope] Building a fast, scalable yet small Zope application
Hedley Roos skrev: > I've followed this thread with interest since I have a Zope site with > tens of millions of entries in BTrees. It scales well, but it requires > many tricks to make it work. > > Roche Compaan wrote these great pieces on ZODB, Data.fs size and > scalability at > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes > and > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter > Thanks for those links, interesting stuff. :-) > My own in-house product is similar to GoogleAnalytics. I have to use a > cascading BTree structure (a btree of btrees of btrees) to handle the > volume. This is because BTrees do slow down the more items they > contain. This is not a ZODB limitation or flaw - it is just how they > work. > Something like Google Analytics I'd be interested in too, it wasn't the aim for this thread but something that's been bobbing around in my head. Is this something you're thinking of releasing or is it "too good/bad to share"? -Morten -- Morten W. Petersen Manager Nidelven IT Ltd Phone: +47 45 44 00 69 Email: mor...@nidelven-it.no ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Building a fast, scalable yet small Zope application
For huge inserts like that, have you looked at the more modern alternatives such as Tokyo Cabinet or MongoDB? I heard about an experiment to transfer 20 million text blobs into a Tokyo Cabinet. The first 10 million inserts were superfast but after that it started to take up to a second to insert each item. I'm not famililar with how good they are but I know they both have indexing. And I'm confident they both have good Python APIs. Or watch Bob Ippolitos PyCon 2009 talk on "Drop ACID". 2009/4/27 Hedley Roos : > I've followed this thread with interest since I have a Zope site with > tens of millions of entries in BTrees. It scales well, but it requires > many tricks to make it work. > > Roche Compaan wrote these great pieces on ZODB, Data.fs size and > scalability at > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes > and > http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter > . > > My own in-house product is similar to GoogleAnalytics. I have to use a > cascading BTree structure (a btree of btrees of btrees) to handle the > volume. This is because BTrees do slow down the more items they > contain. This is not a ZODB limitation or flaw - it is just how they > work. > > My structure allows for fast inserts, but they also allow aggregation > of data. So if my lowest level of BTrees store hits for a particular > hour in time then the containing BTree always knows exactly how many > hits were made in a day. I update all parent BTrees as soon as an item > is inserted. The cost of this operation is O(1) for every parent. > These are all details but every single one influenced my design. > > What is important is that you cannot just use the ZCatalog to index > tens of millions of items since every index is a single BTree and will > thus suffer the larger it gets. So you must roll your own to fit your > problem domain. > > Data warehousing is probably a good idea as well. > > My problem domain allows me to defer inserts, so I have a queuerunner > that commits larger transactions in batches. This is better than lots > of small writes. This may of course not fit your model. > > Familiarize yourself with TreeSets and set operations in Python (union > etc.) since those tools form the backbone of catalogueing. > > Hedley > ___ > Zope maillist - z...@zope.org > http://mail.zope.org/mailman/listinfo/zope > ** No cross posts or HTML encoding! ** > (Related lists - > http://mail.zope.org/mailman/listinfo/zope-announce > http://mail.zope.org/mailman/listinfo/zope-dev ) > -- Peter Bengtsson, work www.fry-it.com home www.peterbe.com hobby www.issuetrackerproduct.com ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Building a fast, scalable yet small Zope application
On Mon, Apr 27, 2009 at 17:57, Morten W. Petersen wrote: > OK. Well, I'm concerned about how much a database would grow. I'm thinking > if > I use one BTree for all the entries, would the database grow just a little > or a lot when > you start getting into the millions of entries when inserting one small > item? Growth is a problem only if you are going to modify these entries a lot. > Mm. Yes, Plone is a bit sluggish, that's why I want to write a purely > Zope-based app. Absolutely. > Mm. I guess I could be OK with one "index", it being the id/path of the > object. However, > it would be nice to build for the future and include the ability to search > all objects. Maybe > a combination of the two could work. Yeah, for full text search you would definietly benefit from the full text indexes that the catalog has. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64 ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Building a fast, scalable yet small Zope application
I've followed this thread with interest since I have a Zope site with tens of millions of entries in BTrees. It scales well, but it requires many tricks to make it work. Roche Compaan wrote these great pieces on ZODB, Data.fs size and scalability at http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes and http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter . My own in-house product is similar to GoogleAnalytics. I have to use a cascading BTree structure (a btree of btrees of btrees) to handle the volume. This is because BTrees do slow down the more items they contain. This is not a ZODB limitation or flaw - it is just how they work. My structure allows for fast inserts, but they also allow aggregation of data. So if my lowest level of BTrees store hits for a particular hour in time then the containing BTree always knows exactly how many hits were made in a day. I update all parent BTrees as soon as an item is inserted. The cost of this operation is O(1) for every parent. These are all details but every single one influenced my design. What is important is that you cannot just use the ZCatalog to index tens of millions of items since every index is a single BTree and will thus suffer the larger it gets. So you must roll your own to fit your problem domain. Data warehousing is probably a good idea as well. My problem domain allows me to defer inserts, so I have a queuerunner that commits larger transactions in batches. This is better than lots of small writes. This may of course not fit your model. Familiarize yourself with TreeSets and set operations in Python (union etc.) since those tools form the backbone of catalogueing. Hedley ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Building a fast, scalable yet small Zope application
Peter Bengtsson skrev: > From experience I find that BTrees are very fast to write to and pick > out items from. Even in the millions. (Never gone into the tens of > millions or further) > Also, when it comes to browsing stuff I find SQL faster and easier to > work with. An added advantage of a RDBMS is that you get the indexing > seamlessly built in (no need to bridge zbrain.getObject()) and it > makes it easier to optimize and figure out which indexes help and > which indexes slow you down which is something that is far from > obvious with a ZCatalog approach. > Right. But wouldn't profiling indexes in Zope be as easy as wrapping the index search method in a function that does time.time before and after the search? :-) -Morten -- Morten W. Petersen Manager Nidelven IT Ltd Phone: +47 45 44 00 69 Email: mor...@nidelven-it.no ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Building a fast, scalable yet small Zope application
Lennart Regebro skrev: > On Sat, Apr 25, 2009 at 13:24, Morten W. Petersen > wrote: > >> So far, I've been contemplating disabling undo (if that's possible), >> > > I doubt that it would make a difference. The Undo functionality comes > out of the database being logging, and changing that would mean pretty > much a complete rewrite. > OK. Well, I'm concerned about how much a database would grow. I'm thinking if I use one BTree for all the entries, would the database grow just a little or a lot when you start getting into the millions of entries when inserting one small item? >> and using BTree structures, maybe segmenting objects into different groups >> (folders) to further speed up lookups. >> > > Yes, in my experience putting small objects in to BTree structures is > quite fast. You may be talking about BTreeFolders, and in that case I > don't know, I haven't done any sort of performance testing on those, I > have used BTrees directly though, and that was fast. I haven't > compared to SQL, but others have, and ZODB itself seems according to > those tests quite fast. We know Plone slows everything down immensly > in any case. > > I don't know if BTrees get slow when they get very big, so you would > need to test that. > Mm. Yes, Plone is a bit sluggish, that's why I want to write a purely Zope-based app. Yeah, I'll have to try different storage strategies in the ZODB, to see if a BTreeFolder containing BTrees in the [0-9|A-Z|a-z] ranges would do, or if I need to partition it up further with BTreeFolders containing BTreeFolders. On the one hand I'm concerned about lookup speed, on the other about speed of inserts and how much the entire database will grow inserting a < 1 KB object. >> Should I consider using the ZCatalog for faster lookups? >> > > Maybe. You probably need to not only store the objects in BTrees, but > also somehow have indexes. These you do by storing the values you want > to search on in BTrees as well. The ZCatalog does this in a > configurable way for you, so if you need configurability, yes. If not, > it's probably faster to make your own indexes with your own BTrees. > Mm. I guess I could be OK with one "index", it being the id/path of the object. However, it would be nice to build for the future and include the ability to search all objects. Maybe a combination of the two could work. -Morten -- Morten W. Petersen Manager Nidelven IT Ltd Phone: +47 45 44 00 69 Email: mor...@nidelven-it.no ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Building a fast, scalable yet small Zope application
> I suggest you experiment a bit. Create 100 million objects, and do > some of the actions you are planning to do on them. > Right. I'm thinking of taking the time to try a simple SQL based implementation, as well as one in ZODB. I need to learn more about high-speed Zope programming as well as keeping my SQL skills up to date so. :-) -Morten -- Morten W. Petersen Manager Nidelven IT Ltd Phone: +47 45 44 00 69 Email: mor...@nidelven-it.no ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Building a fast, scalable yet small Zope application
>From experience I find that BTrees are very fast to write to and pick out items from. Even in the millions. (Never gone into the tens of millions or further) Also, when it comes to browsing stuff I find SQL faster and easier to work with. An added advantage of a RDBMS is that you get the indexing seamlessly built in (no need to bridge zbrain.getObject()) and it makes it easier to optimize and figure out which indexes help and which indexes slow you down which is something that is far from obvious with a ZCatalog approach. 2009/4/25 Morten W. Petersen : > Hi, > > I'm considering building a large scale, but small in features site. It > will contain > lots of small objects (millions, tens of millions, hundreds of millions) > of objects, > where each object has a couple of strings and maybe some other light > attributes. > > So far, I've been contemplating disabling undo (if that's possible), and > using > BTree structures, maybe segmenting objects into different groups > (folders) to > further speed up lookups. Scalability is also an issue, should I > consider using > RelStorage? Should I consider using the ZCatalog for faster lookups? > > Has anyone else developed something similar? Are there Zope product > examples out there that fit the bill? > > -Morten > > -- > Morten W. Petersen > Manager > Nidelven IT Ltd > > Phone: +47 45 44 00 69 > Email: mor...@nidelven-it.no > > ___ > Zope maillist - z...@zope.org > http://mail.zope.org/mailman/listinfo/zope > ** No cross posts or HTML encoding! ** > (Related lists - > http://mail.zope.org/mailman/listinfo/zope-announce > http://mail.zope.org/mailman/listinfo/zope-dev ) > -- Peter Bengtsson, work www.fry-it.com home www.peterbe.com hobby www.issuetrackerproduct.com ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Building a fast, scalable yet small Zope application
On Sat, Apr 25, 2009 at 13:24, Morten W. Petersen wrote: > So far, I've been contemplating disabling undo (if that's possible), I doubt that it would make a difference. The Undo functionality comes out of the database being logging, and changing that would mean pretty much a complete rewrite. > and using BTree structures, maybe segmenting objects into different groups > (folders) to further speed up lookups. Yes, in my experience putting small objects in to BTree structures is quite fast. You may be talking about BTreeFolders, and in that case I don't know, I haven't done any sort of performance testing on those, I have used BTrees directly though, and that was fast. I haven't compared to SQL, but others have, and ZODB itself seems according to those tests quite fast. We know Plone slows everything down immensly in any case. I don't know if BTrees get slow when they get very big, so you would need to test that. > Should I consider using the ZCatalog for faster lookups? Maybe. You probably need to not only store the objects in BTrees, but also somehow have indexes. These you do by storing the values you want to search on in BTrees as well. The ZCatalog does this in a configurable way for you, so if you need configurability, yes. If not, it's probably faster to make your own indexes with your own BTrees. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64 ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Building a fast, scalable yet small Zope application
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 25.04.2009 um 13:24 schrieb Morten W. Petersen: > Hi, > > I'm considering building a large scale, but small in features site. > It > will contain > lots of small objects (millions, tens of millions, hundreds of > millions) > of objects, > where each object has a couple of strings and maybe some other light > attributes. > > So far, I've been contemplating disabling undo (if that's possible), > and > using > BTree structures, maybe segmenting objects into different groups > (folders) to > further speed up lookups. Scalability is also an issue, should I > consider using > RelStorage? Should I consider using the ZCatalog for faster lookups? This description is pretty weak for given any kind of hint since it does not contain any information about your data model etc. Did you consider using a RDBMS? Any way...you need to provide more information. - -aj -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkny9tMACgkQCJIWIbr9KYwvFwCfSL12AbwO1iIiwzSHewxcy6hZ 9D4AoIolVcNtpxTf0ZcbpyRyHmEUu3QX =46wd -END PGP SIGNATURE- ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )