Re: [Zope-dev] NEO High Performance Distributed Fault Tolerant ZODB Storage

2010-04-07 Thread Vincent Pelletier
Le mardi 6 avril 2010 18:54:24, Jim Fulton a écrit :
 Is there a document you can point to that provide a description of the
 approach used?

A document is in the works, in the while I'll try to describe the architecture 
briefly (should be readable to people knowing a bit about TPC) and give some 
typical use cases.

NEO is distributed, with different types of nodes involved.
The 3 main types are:
- Client nodes: what gets embedded in any application using NEO, so typically
  Zope + ZODB
- Master nodes: one of them gets elected (to become the primary master node,
  other becoming secondary master nodes). The primary handles all
  centralised tasks, such as:
  - oid/tid generation
  - cluster consistency checking (high level: is there enough nodes to cover
all data)
  - broadcasting cluster changes to all nodes (new storage node, storage node
getting disconnected, etc)
  When a primary master falls, secondaries take over by restarting an
  election.
  There is no persistent data on a master node, asides from its configuration
  (the name of the NEO cluster it belongs to, the addresses of other master
  nodes and its own listening address)
- Storage nodes: they contain object data, stored  accessed through them
  and ultimately stored in a local storage backend (MySQL currently, but all
  NEO really needs is something which can guarantee some atomicity and limited
  index lookups)

Other nodes (existing or planned) are:
- Admin node
  Cluster monitoring (human-level: book-keeping of cluster health)
- Control command
  Administrator CLI tool to trigger cluster actions.
  Technical note: for now, it uses the admin node as a stepping stone in all
  cases, though it could be avoided in some cases.
- Backup node
  Very similar technically to a storage node, but only dumping data out of
  storage nodes, and being a CLI tool, probably not a daemon (not implemented
  yet).

Remarks:
- there is currently no security in NEO (cryptography/authentication)
- expected cluster size is of around 1000 (client + storage) nodes, probably
  more (depends on the node type ratio and usage pattern)

Some sample use cases:
Client wants data
Client connects to the primary master node (happens once upon startup,
and again if primary master dies). During the initial handshake, the client 
receives the partition table, which tells him which storage nodes are part 
of the cluster and which one contains which object. Then, client connects to 
one of the storage nodes which is expected to contain desired data, and asks 
it to be served.

Client wants to commit
Client is already known to cluster and enters TPC in ZODB: it sends object 
data to all candidate storage (known by looking up in its partition table) for 
each object. Those storage locally handle locking at object level, a write 
lock being taken upon such data transmission. Once the store phase is over, 
vote phase starts and the vote results depends on storage refusing data (base 
version of object not being the latest one at commit time, etc), resulting in 
conflict resolution, and ultimately in ConflictError exception.
Then the client notified the primary master of ZODB decision (finish or 
abort), which in turn asks all involved storage nodes to make changes 
persistent or trash them (releasing object-level write locks).
If it chose to finish, involved storage take a read lock on objects (barrier 
kind of use), answer master that they are done acquiring this lock. In turn, 
master asks them to release all locks (barrier effect is achieved, write lock 
get released to allow further transactions) and send invalidation requests to 
all clients.

New storage enters cluster
When a storage node enters an existing (and working) cluster, it will be 
assigned to some partitions, in order to achieve load balancing. It will by 
itself connect to storage nodes having data for those partitions, and start 
replicating them. Those partitions might or might not be ultimately be dropped 
from their original container nodes, depending on the data replication 
constraints (how many copy of those partitions exist in the whole cluster).

Storage dies
When a storage dies (disconnected, or request timeout), it is set as 
temporarily down in primary master partition table (change which is then 
broadcast to all other nodes). it means that the storage node might still have 
its data, but is currently unavailable. If the lost storage contained the last 
copy of any partition, the cluster lost a part of its content: it asks all 
nodes to interrupt service (they stay running, but refuse to serve further 
requests until the cluster gets back to a running state).
Note that currently, loosing a storage doesn't trigger automatically the 
symmetric action to adding a storage node: partitions are not added to 
existing nodes. It can only be triggered by manual action, via the admin node 
and its command line tool. This was chosen to avoid seeing the cluster waste 
time balancing data over and over 

Re: [Zope-dev] NEO High Performance Distributed Fault Tolerant ZODB Storage

2010-04-07 Thread Jim Fulton
On Wed, Apr 7, 2010 at 8:50 AM, Vincent Pelletier vinc...@nexedi.com wrote:
 Le mardi 6 avril 2010 18:54:24, Jim Fulton a écrit :
 Is there a document you can point to that provide a description of the
 approach used?

 A document is in the works, in the while I'll try to describe the architecture
 briefly (should be readable to people knowing a bit about TPC) and give some
 typical use cases.

...

Thanks for sharing some details. The hints of the architecture are intriguing.
I look forward to studying this in more detail when the document is available.

...

 You mean, a presentation of NEO

Yes.

 (maybe what I wrote above would fit) ?

More details would be good. Of course, I could look at the source, but
given the license, I'm not highly motivated.

Jim

-- 
Jim Fulton
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] NEO High Performance Distributed Fault Tolerant ZODB Storage

2010-04-06 Thread Jim Fulton
On Wed, Mar 31, 2010 at 4:32 AM, Vincent Pelletier vinc...@nexedi.com wrote:
 Hi,

 I would like to present you the NEOPPOD project, aiming at improving ZODB
 Storage scalability. The implementation is in a rather good shape, although it
 fails at a few ZODB tests at the moment (they are currently being worked on).
 Scalability is achieved by distributing data over multiple servers
 (replication and load balancing) with the ability to extend/reduce cluster
 on-line.

Is there a document you can point to that provide a description of the
approach used?

 Its code is available under the GPL,

That's unfortunate.  Why not a less restrictive license?

 [1] http://www.neoppod.org/
...
 [3] http://www.myerp5.com/kb/enterprise-High.Performance.Zope/view

These seem to be very high level.
You provide a link to the source, which is rather low level.
Anything in between?

Jim

-- 
Jim Fulton
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] NEO High Performance Distributed Fault Tolerant ZODB Storage

2010-04-01 Thread Vincent Pelletier
Le mercredi 31 mars 2010 18:32:31, vous avez écrit :
 A few questions that you may want to add in a FAQ.

We started that page and will publish it very soon, based on most points you 
raised. Other pages are also being worked on, such as an overview of a simple 
NEO cluster.

   - Why not include ZODB 3.10/Python 2.6 as a goal of the project?
 - I understand *today* the technologies use python 2.4 but
   ZODB 3.10/Plone 4/Zope 2.12 use python 2.6

We indeed aim at supporting more recent versions of python and Zope.

Actually, your remark made us realise that our functional tests are currently 
(accidentally) running in a mixed 2.4/2.5 python environment: test process is 
started explicitly with 2.4, and forked processes (masters, storage and admin 
nodes) are running on default python, which is 2.5 (as of Debian stable).

The standard Zope version in Nexedi is 2.8, which explains why we want to 
support it. We will switch to 2.12, as we have ERP5 unit tests running on 2.12 
for some weeks[1] now. NEO will move to 2.12 at the same time or earlier.

   - Maybe explain the goal of the project clearer:
 
 NEO provides distributed, redundant and transactional storage designed
 for petabytes of persistent (python?) objects.
 

Thanks, updated.

   - A buildout for NEO would lower bar for evaluation

This is on our roadmap (...to be published along with the FAQ), but the 
priority currently goes to 2 developments which might/will break 
compatibility: pack support (required an undo rework which was recently 
integrated, pack itself needs more unit testing prior to integration) and 
multi-export support (aka ZODB mountpoints, also in need for more testing 
before integration).

[1] http://mail.nexedi.com/pipermail/erp5-report/ (_z212 in subject)
-- 
Vincent Pelletier
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] NEO High Performance Distributed Fault Tolerant ZODB Storage

2010-03-31 Thread Alan Runyan
On Wed, Mar 31, 2010 at 3:32 AM, Vincent Pelletier vinc...@nexedi.comwrote:

 Hi,

 I would like to present you the NEOPPOD project, aiming at improving ZODB
 Storage scalability. The implementation is in a rather good shape, although
 it
 fails at a few ZODB tests at the moment (they are currently being worked
 on).
 Scalability is achieved by distributing data over multiple servers
 (replication and load balancing) with the ability to extend/reduce cluster
 on-line.


Congrats!

A few questions that you may want to add in a FAQ.

  - NEO replaces FileStorage? Maybe typo?
  - What is the gain of using NEO over relstorage + mysql replication?
  - Why not include ZODB 3.10/Python 2.6 as a goal of the project?
- I understand *today* the technologies use python 2.4 but
  ZODB 3.10/Plone 4/Zope 2.12 use python 2.6
  - NEO is a different protocol than ZEO?
  - What is the Blob story with NEO?
  - Any issues with 32bit vs 64bit
  - Backup/restore strategy of NEO

Other notes:
  - Maybe explain the goal of the project clearer:

NEO provides distributed, redundant and transactional storage designed
for petabytes of persistent (python?) objects.

  - A buildout for NEO would lower bar for evaluation
  - How do you plan on storing petabytes in a single MySQL server?  since
that is the data structure backend for NEO?

Looking forward to reading the petrinet article - please send update when
it comes out.

cheers
alan
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )