Re: [ZODB-Dev] [OT] NoSQL

2009-11-15 Thread Roché Compaan
On Sun, 2009-11-15 at 00:31 -0700, Shane Hathaway wrote:
 Roché Compaan wrote:
  On Sat, 2009-11-14 at 14:23 -0700, Shane Hathaway wrote:
  I think proper construction of horizontally scalable databases must be 
  done partly at application level, since a lot of the issues to be solved 
  are specific to the application.
  
  What are the issues you're talking about?
 
 Every database system has almost countless issues to balance, such as 
 durability, consistency, performance, freshness, availability, etc.  The 
 demands of horizontal scaling make the issues too complex to completely 
 delegate to a database layer.

These concerns don't disappear when implementing a solution to big
databases at application level. In my experience it becomes even more
complex at application level and you have to do an inordinate amount of
configuration to manage partitions. With the z3c.sharding implementation
you would have to configure multiple containers.

I can't see why it wouldn't be possible to develop a ZODB storage
similar to hypertable
http://code.google.com/p/hypertable/wiki/ArchitecturalOverview


-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-14 Thread Wichert Akkerman
On 11/13/09 21:33 , Shane Hathaway wrote:
 I've been studying how to build an enormous database based on what I
 know.  There are an incredible number of distributed databases these
 days, but all of them concern me in one way or another.

Can you share some of those concerns with us? I'ld be interested to hear 
what kind of problems you see.

Wichert.
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-14 Thread Shane Hathaway
Roché Compaan wrote:
 On Fri, 2009-11-13 at 13:33 -0700, Shane Hathaway wrote:
 Stephan Richter wrote:
 http://svn.zope.org/z3c.sharding/trunk
 
 Great stuff! This approaches scaling a large data set at application
 level though. Don't you think a ZODB storage doing this for you would
 solve the problem more generally?

I think proper construction of horizontally scalable databases must be 
done partly at application level, since a lot of the issues to be solved 
are specific to the application.

 I think that the master index needs to be partitioned as well. In
 benchmarks I performed early last year (http://bit.ly/pSVmd), a BTree
 could only handle about 250 inserts / second when it approached 10
 million objects, so I'm guessing it will be almost unusable at a 100
 million.

Right.  The top level probably ought to be a dynamic hash, not a BTree. 
  I intended z3c.sharding more as a proof of concept.

Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-14 Thread Shane Hathaway
Wichert Akkerman wrote:
 On 11/13/09 21:33 , Shane Hathaway wrote:
 I've been studying how to build an enormous database based on what I
 know.  There are an incredible number of distributed databases these
 days, but all of them concern me in one way or another.
 
 Can you share some of those concerns with us? I'ld be interested to hear 
 what kind of problems you see.

The best article I've found is a simple presentation and overview:

http://highscalability.com/blog/2009/11/5/a-yes-for-a-nosql-taxonomy.html

The article neatly categorizes a lot of the NoSQL databases.  It 
suggests that document stores have the right level of complexity.  Wide 
columnar stores like Cassandra could be too complicated to gain a lot of 
traction, while simpler databases might lack features we would normally 
take for granted.

In other articles, I learned about CouchDB conflict resolution.  CouchDB 
allows any conflict and it stores both conflicting values, expecting the 
application to resolve the conflict later.  Clearly, CouchDB is designed 
to solve the PDA use case: I change a contact's phone number differently 
on my PDA and my desktop, then when I sync, I click some UI button to 
indicate which is correct.  I think that sort of conflict resolution 
would cause security holes for the application I am working on, but it 
would probably work for a lot of other applications.

Current versions of CouchDB expect applications to scale using 
replication.  Replication is not a substitute for sharding.  The 
couchdb-lounge project seems to be solving that with proxies:

http://tilgovi.github.com/couchdb-lounge/

The Mongo DB guys have a pretty thorough and fair comparison of CouchDB 
and Mongo:

http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB

The bottom of the page lists use cases for MongoDB.  It says people 
building a system with very critical transactions should choose a 
traditional RDBMS.  That seems like reasonable advice for the 
application I'm building, except that I consider ZODB to be at least as 
reliable as an RDBMS.  (RelStorage uses a subset of RDBMS functionality 
that I have found to be reliable.)

I think that by very critical, the MongoDB authors are referring to 
applications that must not allow conflicting updates.  Conflict 
resolution is probably my main concern with all of these new databases. 
  I have no doubts about ZODB's conflict resolution policy, while I can 
imagine a variety of different policies these other databases might 
implement.  A four or five dimensional hash like Cassandra might even 
have a conflict resolution policy that changes with every release.

Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-14 Thread Roché Compaan
On Sat, 2009-11-14 at 14:23 -0700, Shane Hathaway wrote:
 Roché Compaan wrote:
  On Fri, 2009-11-13 at 13:33 -0700, Shane Hathaway wrote:
  Stephan Richter wrote:
  http://svn.zope.org/z3c.sharding/trunk
  
  Great stuff! This approaches scaling a large data set at application
  level though. Don't you think a ZODB storage doing this for you would
  solve the problem more generally?
 
 I think proper construction of horizontally scalable databases must be 
 done partly at application level, since a lot of the issues to be solved 
 are specific to the application.

What are the issues you're talking about?

  I think that the master index needs to be partitioned as well. In
  benchmarks I performed early last year (http://bit.ly/pSVmd), a BTree
  could only handle about 250 inserts / second when it approached 10
  million objects, so I'm guessing it will be almost unusable at a 100
  million.
 
 Right.  The top level probably ought to be a dynamic hash, not a BTree. 
   I intended z3c.sharding more as a proof of concept.

Sure, just thought I'd mention it. 

Also keep in mind that the dynamic hash needs to handle the introduction
of new partitions.


-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-14 Thread Shane Hathaway
Roché Compaan wrote:
 On Sat, 2009-11-14 at 14:23 -0700, Shane Hathaway wrote:
 I think proper construction of horizontally scalable databases must be 
 done partly at application level, since a lot of the issues to be solved 
 are specific to the application.
 
 What are the issues you're talking about?

Every database system has almost countless issues to balance, such as 
durability, consistency, performance, freshness, availability, etc.  The 
demands of horizontal scaling make the issues too complex to completely 
delegate to a database layer.

 Also keep in mind that the dynamic hash needs to handle the introduction
 of new partitions.

Certainly.

Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-14 Thread Andreas Jung
Am 14.11.09 23:33, schrieb Shane Hathaway:

 
  I think that by very critical, the MongoDB authors are referring to 
  applications that must not allow conflicting updates.  Conflict 
  resolution is probably my main concern with all of these new databases. 
I have no doubts about ZODB's conflict resolution policy, while I can 
  imagine a variety of different policies these other databases might 
  implement.  A four or five dimensional hash like Cassandra might even 
  have a conflict resolution policy that changes with every release.
   
My high-level comment:

choose the right tool for each individual problem. We have been building
hybrid applications on top of Zope using the ZODB and a RDBMS for years.
I can imagine building a Web-2.0-ish application on top of a RDBMS for
storing personal data (where transaction integration is a must) and
using something
like MongoDB for mass-data. You have to analyze which data are important
and which are less important and then choose the related backend.

To MongoDB: I made lots of tests with MongoDB lately and found it pretty
amazing, fast and reliable. Especially the replication support looks good
and the sharding functionality (although still alpha or beta) appears
promising. But the speed has its price: only atomicity for single
document entities.

Andreas



Andreas Jung www.zopyx.com i...@zopyx.com mailto:i...@zopyx.com
CEO
ZOPYX Ltd.  Co. KG

attachment: lists.vcf___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-13 Thread Adam GROSZER
Hello,

I think we can look at this at 2 levels.

1.: As your app uses ZODB. Then this is your app's
problem/reponsibility. You use a nosql contender directly from
your app and it's your responsibility to deal with it.

2.: On the ZODB Storage level. So far I can see that level needs
consistency, transactions and locking support. Those are usually
missing from nosql implementations (unless I miss some).
OTOH a key-value store would fit the ZODB storage.
If someone finds/writes a key-value storage that has the above
properties we could give it a try.

Thursday, November 12, 2009, 8:24:43 PM, you wrote:

SH Encolpe Degoute wrote:
 Is there someone in the ZODB development team following this:
 http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosystem/

SH It is possible that ZODB unfortunately occupies the same space as SQL in
SH the CAP triangle:

SH http://camelcase.blogspot.com/2007/08/cap-theorem.html

SH That is to say, ZODB applications require consistency and availability,
SH so if the CAP theorem is true, then ZODB applications can not be very 
SH partition-tolerant.

SH The NoSQL databases provide availability and partition tolerance while
SH foregoing absolute consistency.

SH Shane
SH ___
SH For more information about ZODB, see the ZODB Wiki:
SH http://www.zope.org/Wikis/ZODB/

SH ZODB-Dev mailing list  -  ZODB-Dev@zope.org
SH https://mail.zope.org/mailman/listinfo/zodb-dev


-- 
Best regards,
 Adam GROSZERmailto:agros...@gmail.com
--
Quote of the day:
For a good time, call 836-3100.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-13 Thread Roché Compaan
On Fri, 2009-11-13 at 10:58 +0100, Christian Theune wrote:
 On 11/13/2009 10:42 AM, Adam GROSZER wrote:
  Hello,
  
  I think we can look at this at 2 levels.
  
  1.: As your app uses ZODB. Then this is your app's
  problem/reponsibility. You use a nosql contender directly from
  your app and it's your responsibility to deal with it.
  
  2.: On the ZODB Storage level. So far I can see that level needs
  consistency, transactions and locking support. Those are usually
  missing from nosql implementations (unless I miss some).
  OTOH a key-value store would fit the ZODB storage.
  If someone finds/writes a key-value storage that has the above
  properties we could give it a try.
 
 Looking at the article referenced by Shane I understand that we'd have
 to drop consistency for being able to use such a store -- I feel that's
 not a good idea in the face of ZODB. The applications we write are
 intended to run consistently without having application-level (or even
 user-based) reconciliation work to do if a transaction came through.

I think there is something more important to notice than dropping of
consistency, and that is scaling easily to handle very large datasets.

Having to implement a data partitioning strategy at application level is
very difficult to get right. Having a ZODB storage that is distributed
across machines can become a big selling point for the ZODB and would
make it very convenient for us who do sometimes have the rare
opportunity to write applications that expect data sets in excess of 100
million records.

We had such an opportunity about 2 years ago and although the client
never reached (and probably will never) reach the membership they
dreamed about, they did pay us to develop a storage for members that
could scale to more than a 100 million members. We implemented a data
partitioning strategy at application level. If I had another shot at it,
I would try and develop a distributed ZODB storage, because it would be
a lot simpler compared to what we had to do at application level.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-13 Thread Stephan Richter
On Friday 13 November 2009, Roché Compaan wrote:
 We had such an opportunity about 2 years ago and although the client
 never reached (and probably will never) reach the membership they
 dreamed about, they did pay us to develop a storage for members that
 could scale to more than a 100 million members. We implemented a data
 partitioning strategy at application level. If I had another shot at it,
 I would try and develop a distributed ZODB storage, because it would be
 a lot simpler compared to what we had to do at application level.

Note that Shane developed a sharding solution a year ago with me. It provides 
container-level partitioning.

http://svn.zope.org/z3c.sharding/trunk

This in combination with the encryption work that we did for the ZODB makes 
the ZODB actually be a lot more advanced than some of the new comers.

I am very intrigued now to setup an EC2 cluster and install a z3c.sharding 
based solution demonstrating 100M users with some data. Mmmh...

Regards,
Stephan
-- 
Entrepreneur and Software Geek
Google me. Zope Stephan Richter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-13 Thread Shane Hathaway
Stephan Richter wrote:
 On Friday 13 November 2009, Roché Compaan wrote:
 We had such an opportunity about 2 years ago and although the client
 never reached (and probably will never) reach the membership they
 dreamed about, they did pay us to develop a storage for members that
 could scale to more than a 100 million members. We implemented a data
 partitioning strategy at application level. If I had another shot at it,
 I would try and develop a distributed ZODB storage, because it would be
 a lot simpler compared to what we had to do at application level.
 
 Note that Shane developed a sharding solution a year ago with me. It provides 
 container-level partitioning.
 
 http://svn.zope.org/z3c.sharding/trunk

Thanks for the reminder. :-)

 This in combination with the encryption work that we did for the ZODB makes 
 the ZODB actually be a lot more advanced than some of the new comers.
 
 I am very intrigued now to setup an EC2 cluster and install a z3c.sharding 
 based solution demonstrating 100M users with some data. Mmmh...

I've been studying how to build an enormous database based on what I 
know.  There are an incredible number of distributed databases these 
days, but all of them concern me in one way or another.  I'm wondering 
if ZODB might actually have a fighting chance in the distributed 
database realm.  With z3c.sharding or something like it, I think I would 
set things up as follows:

- In-memory ZODB caches would probably be pointlessly painful at that 
scale, so I would set the ZODB cache size for all partitions to 0.  A 
cache size of 0 allows ZODB to cache for the duration of a request, but 
flushes all objects out of the cache at transaction boundaries.

- With the cache size set to 0, we can disable cache invalidation, which 
will probably be a major win.

- I would rely heavily on memcached to provide the pickles.  I would try 
to use the cache checkpointing algorithm I recently added to RelStorage.

- I would aim to read or write only a small number of objects per 
request from partitions.

Shane

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-13 Thread Alan Runyan
 I am very intrigued now to setup an EC2 cluster and install a z3c.sharding
 based solution demonstrating 100M users with some data. Mmmh...

That is the great thing about EC2.  You can do massive experiments on the cheap.

Actually one of our interns is doing some work on ZODB.  He is doing
mostly narrow
calculations on efficiency of ZODB.  i.e. how structure of ZODB
Filestorage could be changed to
better use disk cache, etc.  Possibly interesting to the community at large.

cheers
alan
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-13 Thread Roché Compaan
On Fri, 2009-11-13 at 13:33 -0700, Shane Hathaway wrote:
 Stephan Richter wrote:
  On Friday 13 November 2009, Roché Compaan wrote:
  We had such an opportunity about 2 years ago and although the client
  never reached (and probably will never) reach the membership they
  dreamed about, they did pay us to develop a storage for members that
  could scale to more than a 100 million members. We implemented a data
  partitioning strategy at application level. If I had another shot at it,
  I would try and develop a distributed ZODB storage, because it would be
  a lot simpler compared to what we had to do at application level.
  
  Note that Shane developed a sharding solution a year ago with me. It 
  provides 
  container-level partitioning.
  
  http://svn.zope.org/z3c.sharding/trunk

Great stuff! This approaches scaling a large data set at application
level though. Don't you think a ZODB storage doing this for you would
solve the problem more generally?

 Thanks for the reminder. :-)
 
  This in combination with the encryption work that we did for the ZODB makes 
  the ZODB actually be a lot more advanced than some of the new comers.
  
  I am very intrigued now to setup an EC2 cluster and install a z3c.sharding 
  based solution demonstrating 100M users with some data. Mmmh...
 
 I've been studying how to build an enormous database based on what I 
 know.  There are an incredible number of distributed databases these 
 days, but all of them concern me in one way or another.  I'm wondering 
 if ZODB might actually have a fighting chance in the distributed 
 database realm.  With z3c.sharding or something like it, I think I would 
 set things up as follows:
 
 - In-memory ZODB caches would probably be pointlessly painful at that 
 scale, so I would set the ZODB cache size for all partitions to 0.  A 
 cache size of 0 allows ZODB to cache for the duration of a request, but 
 flushes all objects out of the cache at transaction boundaries.
 
 - With the cache size set to 0, we can disable cache invalidation, which 
 will probably be a major win.
 
 - I would rely heavily on memcached to provide the pickles.  I would try 
 to use the cache checkpointing algorithm I recently added to RelStorage.
 
 - I would aim to read or write only a small number of objects per 
 request from partitions.

I think that the master index needs to be partitioned as well. In
benchmarks I performed early last year (http://bit.ly/pSVmd), a BTree
could only handle about 250 inserts / second when it approached 10
million objects, so I'm guessing it will be almost unusable at a 100
million.


-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [OT] NoSQL

2009-11-12 Thread Shane Hathaway
Encolpe Degoute wrote:
 Is there someone in the ZODB development team following this:
 http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosystem/

It is possible that ZODB unfortunately occupies the same space as SQL in 
the CAP triangle:

http://camelcase.blogspot.com/2007/08/cap-theorem.html

That is to say, ZODB applications require consistency and availability, 
so if the CAP theorem is true, then ZODB applications can not be very 
partition-tolerant.

The NoSQL databases provide availability and partition tolerance while 
foregoing absolute consistency.

Shane
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev