Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Peter Bengtsson
From experience I find that BTrees are very fast to write to and pick
out items from. Even in the millions. (Never gone into the tens of
millions or further)
Also, when it comes to browsing stuff I find SQL faster and easier to
work with. An added advantage of a RDBMS is that you get the indexing
seamlessly built in (no need to bridge zbrain.getObject()) and it
makes it easier to optimize and figure out which indexes help and
which indexes slow you down which is something that is far from
obvious with a ZCatalog approach.

2009/4/25 Morten W. Petersen mor...@nidelven-it.no:
 Hi,

 I'm considering building a large scale, but small in features site.  It
 will contain
 lots of small objects (millions, tens of millions, hundreds of millions)
 of objects,
 where each object has a couple of strings and maybe some other light
 attributes.

 So far, I've been contemplating disabling undo (if that's possible), and
 using
 BTree structures, maybe segmenting objects into different groups
 (folders) to
 further speed up lookups.  Scalability is also an issue, should I
 consider using
 RelStorage?  Should I consider using the ZCatalog for faster lookups?

 Has anyone else developed something similar?  Are there Zope product
 examples out there that fit the bill?

 -Morten

 --
 Morten W. Petersen
 Manager
 Nidelven IT Ltd

 Phone: +47 45 44 00 69
 Email: mor...@nidelven-it.no

 ___
 Zope maillist  -  z...@zope.org
 http://mail.zope.org/mailman/listinfo/zope
 **   No cross posts or HTML encoding!  **
 (Related lists -
  http://mail.zope.org/mailman/listinfo/zope-announce
  http://mail.zope.org/mailman/listinfo/zope-dev )




-- 
Peter Bengtsson,
work www.fry-it.com
home www.peterbe.com
hobby www.issuetrackerproduct.com
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Morten W. Petersen

 I suggest you experiment a bit. Create 100 million objects, and do
 some of the actions you are planning to do on them.
   

Right.  I'm thinking of taking the time to try a simple SQL based 
implementation,
as well as one in ZODB.  I need to learn more about high-speed Zope 
programming
as well as keeping my SQL skills up to date so.  :-)

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: mor...@nidelven-it.no

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Morten W. Petersen
Lennart Regebro skrev:
 On Sat, Apr 25, 2009 at 13:24, Morten W. Petersen mor...@nidelven-it.no 
 wrote:
   
 So far, I've been contemplating disabling undo (if that's possible),
 

 I doubt that it would make a difference. The Undo functionality comes
 out of the database being logging, and changing that would mean pretty
 much a complete rewrite.
   

OK.  Well, I'm concerned about how much a database would grow.  I'm 
thinking if
I use one BTree for all the entries, would the database grow just a 
little or a lot when
you start getting into the millions of entries when inserting one small 
item?

 and using BTree structures, maybe segmenting objects into different groups
 (folders) to further speed up lookups.
 

 Yes, in my experience putting small objects in to BTree structures is
 quite fast. You may be talking about BTreeFolders, and in that case I
 don't know, I haven't done any sort of performance testing on those, I
 have used BTrees directly though, and that was fast. I haven't
 compared to SQL, but others have, and ZODB itself seems according to
 those tests quite fast. We know Plone slows everything down immensly
 in any case.

 I don't know if BTrees get slow when they get very big, so you would
 need to test that.
   

Mm.  Yes, Plone is a bit sluggish, that's why I want to write a purely 
Zope-based app.

Yeah, I'll have to try different storage strategies in the ZODB, to see 
if a BTreeFolder
containing BTrees in the [0-9|A-Z|a-z] ranges would do, or if I need to 
partition it
up further with BTreeFolders containing BTreeFolders.

On the one hand I'm concerned about lookup speed, on the other about 
speed of
inserts and how much the entire database will grow inserting a  1 KB 
object.

  Should I consider using the ZCatalog for faster lookups?
 

 Maybe. You probably need to not only store the objects in BTrees, but
 also somehow have indexes. These you do by storing the values you want
 to search on in BTrees as well. The ZCatalog does this in a
 configurable way for you, so if you need configurability, yes. If not,
 it's probably faster to make your own indexes with your own BTrees.
   

Mm.  I guess I could be OK with one index, it being the id/path of the 
object.  However,
it would be nice to build for the future and include the ability to 
search all objects.  Maybe
a combination of the two could work.

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: mor...@nidelven-it.no

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Morten W. Petersen
Peter Bengtsson skrev:
 From experience I find that BTrees are very fast to write to and pick
 out items from. Even in the millions. (Never gone into the tens of
 millions or further)
 Also, when it comes to browsing stuff I find SQL faster and easier to
 work with. An added advantage of a RDBMS is that you get the indexing
 seamlessly built in (no need to bridge zbrain.getObject()) and it
 makes it easier to optimize and figure out which indexes help and
 which indexes slow you down which is something that is far from
 obvious with a ZCatalog approach.
   

Right.  But wouldn't profiling indexes in Zope be as easy as wrapping the
index search method in a function that does time.time before and after
the search?  :-)

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: mor...@nidelven-it.no

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Hedley Roos
I've followed this thread with interest since I have a Zope site with
tens of millions of entries in BTrees. It scales well, but it requires
many tricks to make it work.

Roche Compaan wrote these great pieces on ZODB, Data.fs size and
scalability at 
http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes
and 
http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter
.

My own in-house product is similar to GoogleAnalytics. I have to use a
cascading BTree structure (a btree of btrees of btrees) to handle the
volume. This is because BTrees do slow down the more items they
contain. This is not a ZODB limitation or flaw - it is just how they
work.

My structure allows for fast inserts, but they also allow aggregation
of data. So if my lowest level of BTrees store hits for a particular
hour in time then the containing BTree always knows exactly how many
hits were made in a day. I update all parent BTrees as soon as an item
is inserted. The cost of this operation is O(1) for every parent.
These are all details but every single one influenced my design.

What is important is that you cannot just use the ZCatalog to index
tens of millions of items since every index is a single BTree and will
thus suffer the larger it gets. So you must roll your own to fit your
problem domain.

Data warehousing is probably a good idea as well.

My problem domain allows me to defer inserts, so I have a queuerunner
that commits larger transactions in batches. This is better than lots
of small writes. This may of course not fit your model.

Familiarize yourself with TreeSets and set operations in Python (union
etc.) since those tools form the backbone of catalogueing.

Hedley
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Lennart Regebro
On Mon, Apr 27, 2009 at 17:57, Morten W. Petersen mor...@nidelven-it.no wrote:
 OK.  Well, I'm concerned about how much a database would grow.  I'm thinking
 if
 I use one BTree for all the entries, would the database grow just a little
 or a lot when
 you start getting into the millions of entries when inserting one small
 item?

Growth is a problem only if you are going to modify these entries a lot.

 Mm.  Yes, Plone is a bit sluggish, that's why I want to write a purely
 Zope-based app.

Absolutely.

 Mm.  I guess I could be OK with one index, it being the id/path of the
 object.  However,
 it would be nice to build for the future and include the ability to search
 all objects.  Maybe
 a combination of the two could work.

Yeah, for full text search you would definietly benefit from the full
text indexes that the catalog has.

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Peter Bengtsson
For huge inserts like that, have you looked at the more modern
alternatives such as Tokyo Cabinet or MongoDB?
I heard about an experiment to transfer 20 million text blobs into a
Tokyo Cabinet. The first 10 million inserts were superfast but after
that it started to take up to a second to insert each item.
I'm not famililar with how good they are but I know they both have
indexing. And I'm confident they both have good Python APIs.
Or watch Bob Ippolitos PyCon 2009 talk on Drop ACID.

2009/4/27 Hedley Roos hedleyr...@gmail.com:
 I've followed this thread with interest since I have a Zope site with
 tens of millions of entries in BTrees. It scales well, but it requires
 many tricks to make it work.

 Roche Compaan wrote these great pieces on ZODB, Data.fs size and
 scalability at 
 http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes
 and 
 http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter
 .

 My own in-house product is similar to GoogleAnalytics. I have to use a
 cascading BTree structure (a btree of btrees of btrees) to handle the
 volume. This is because BTrees do slow down the more items they
 contain. This is not a ZODB limitation or flaw - it is just how they
 work.

 My structure allows for fast inserts, but they also allow aggregation
 of data. So if my lowest level of BTrees store hits for a particular
 hour in time then the containing BTree always knows exactly how many
 hits were made in a day. I update all parent BTrees as soon as an item
 is inserted. The cost of this operation is O(1) for every parent.
 These are all details but every single one influenced my design.

 What is important is that you cannot just use the ZCatalog to index
 tens of millions of items since every index is a single BTree and will
 thus suffer the larger it gets. So you must roll your own to fit your
 problem domain.

 Data warehousing is probably a good idea as well.

 My problem domain allows me to defer inserts, so I have a queuerunner
 that commits larger transactions in batches. This is better than lots
 of small writes. This may of course not fit your model.

 Familiarize yourself with TreeSets and set operations in Python (union
 etc.) since those tools form the backbone of catalogueing.

 Hedley
 ___
 Zope maillist  -  z...@zope.org
 http://mail.zope.org/mailman/listinfo/zope
 **   No cross posts or HTML encoding!  **
 (Related lists -
  http://mail.zope.org/mailman/listinfo/zope-announce
  http://mail.zope.org/mailman/listinfo/zope-dev )




-- 
Peter Bengtsson,
work www.fry-it.com
home www.peterbe.com
hobby www.issuetrackerproduct.com
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Morten W. Petersen
Hedley Roos skrev:
 I've followed this thread with interest since I have a Zope site with
 tens of millions of entries in BTrees. It scales well, but it requires
 many tricks to make it work.

 Roche Compaan wrote these great pieces on ZODB, Data.fs size and
 scalability at 
 http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes
 and 
 http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter
   

Thanks for those links, interesting stuff.  :-)

 My own in-house product is similar to GoogleAnalytics. I have to use a
 cascading BTree structure (a btree of btrees of btrees) to handle the
 volume. This is because BTrees do slow down the more items they
 contain. This is not a ZODB limitation or flaw - it is just how they
 work.
   

Something like Google Analytics I'd be interested in too, it wasn't the 
aim for this
thread but something that's been bobbing around in my head.  Is this 
something
you're thinking of releasing or is it too good/bad to share?

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: mor...@nidelven-it.no

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-26 Thread Lennart Regebro
On Sat, Apr 25, 2009 at 13:24, Morten W. Petersen mor...@nidelven-it.no wrote:
 So far, I've been contemplating disabling undo (if that's possible),

I doubt that it would make a difference. The Undo functionality comes
out of the database being logging, and changing that would mean pretty
much a complete rewrite.

 and using BTree structures, maybe segmenting objects into different groups
 (folders) to further speed up lookups.

Yes, in my experience putting small objects in to BTree structures is
quite fast. You may be talking about BTreeFolders, and in that case I
don't know, I haven't done any sort of performance testing on those, I
have used BTrees directly though, and that was fast. I haven't
compared to SQL, but others have, and ZODB itself seems according to
those tests quite fast. We know Plone slows everything down immensly
in any case.

I don't know if BTrees get slow when they get very big, so you would
need to test that.

 Should I consider using the ZCatalog for faster lookups?

Maybe. You probably need to not only store the objects in BTrees, but
also somehow have indexes. These you do by storing the values you want
to search on in BTrees as well. The ZCatalog does this in a
configurable way for you, so if you need configurability, yes. If not,
it's probably faster to make your own indexes with your own BTrees.

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-25 Thread Andreas Jung
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Am 25.04.2009 um 13:24 schrieb Morten W. Petersen:

 Hi,

 I'm considering building a large scale, but small in features site.   
 It
 will contain
 lots of small objects (millions, tens of millions, hundreds of  
 millions)
 of objects,
 where each object has a couple of strings and maybe some other light
 attributes.

 So far, I've been contemplating disabling undo (if that's possible),  
 and
 using
 BTree structures, maybe segmenting objects into different groups
 (folders) to
 further speed up lookups.  Scalability is also an issue, should I
 consider using
 RelStorage?  Should I consider using the ZCatalog for faster lookups?


This description is pretty weak for given any kind of hint since it  
does not
contain any information about your data model etc. Did you consider
using a RDBMS? Any way...you need to provide more information.

- -aj

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkny9tMACgkQCJIWIbr9KYwvFwCfSL12AbwO1iIiwzSHewxcy6hZ
9D4AoIolVcNtpxTf0ZcbpyRyHmEUu3QX
=46wd
-END PGP SIGNATURE-
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )