Re: [ZODB-Dev] New to ZODB, how to make a db efficently?

2008-08-19 Thread Christian Theune
Hi,

On Mon, 2008-08-18 at 20:17 +0300, Markus wrote:
 I'm new here, so hi! ;-)
 
 I'm looking to create a database of persons and events, later to
 search persons by names, events by dates and locations (participants
 of events are already in an attribute of the event and instances of
 Person, which inherits from Persistent)
 
 At first I made a PersistentList of all the events and a
 PersistentMapping of all the people by an id, but later found out,
 that searching through a list with a for-loop is very slow (there are
 about 200 000 people and 100 000 events). And so as I've looked around
 here a bit (the docs and
 the wikis are mostly outdated or empty -- there's also talk about the
 bad documentation in this mailinglist) I've found, that I should be
 using OOBTree for making the indexes.

Yes, the documentation situation is less than desirable for
beginners. :/

 So what I'm asking is, is it reasonable to create the db like this:
 persons in root['persons'],
 which is a OOBTree, mapping names to Person-objects and events in
 root['events'], which also an OOBTree, mapping dates to Event-objects?
 And if I want to map locations to events, I should do it at the same
 time, when creating the events, so I don't have to loop through all of
 them again?

Here's what I do:

Create a physical structure that models your data in a 'natural' way.
This can e.g. be:

- A root object representing the application, in case you may want to
  hold multiple instances of your application within a single database.

- BTrees for storing large lists of objects, like you do. But mainly
  with a single lookup direction, e.g. for you the name-to-person
  mapping.

  Some times, those lists just work with arbitrary IDs for the objects,
  much like primary keys in tables.

  Alternatively, if you have a VFS-like structure, you might want to use
  the folder/item metaphor for the main structure of your database.

- Add an indexing/searching framework for orthogonal queries. This is
  called `cataloging` in the Zope/ZODB universe. Some (more or less)
  standalone solutions are found in the proximity of `zope.catalog`.

  Use those to create tabular views on your data (independent of the
  physical structure) that are queryable by indexed arguments. Those are
  fast.

 If I have a OOBTree-mapping of dates to events, what should the values
 of it be? PersistentLists? I've read something about Buckets or Sets,
 but I'm not sure what they are good
 for, Bucket seems to behave like the equivalent BTree (OO, or IO or OI
 or IF or ), but Set seems to be a set... Is that true?

I'd go with a flat structure. See my note on 'arbitrary' IDs above.

 What's the difference between a PersistentMapping and a OOBTree or
 OOBucket? Only the back-end, because on the front they all seem like
 dictionarys? Should I be using OOBTrees and OOBuckets for what I'm
 doing, because strings and dates are Os and not Is or Fs or...

A PM is a persistent dictionary that loads all of its data at once.

A bucket is an internal node of a BTree.

A BTree is a (key-)sorted(!) data structure that provides a key/value
interface like dictionaries do. Due to that, the lookup of items in a
BTree is fast and also memory efficient, as only individual buckets of
the BTree need to be activated for a lookup (optimally only O(logn)
buckets).

Christian

-- 
Christian Theune · [EMAIL PROTECTED]
gocept gmbh  co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 7 · fax +49 345 1229889 1
Zope and Plone consulting and development


signature.asc
Description: This is a digitally signed message part
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] New to ZODB, how to make a db efficently?

2008-08-18 Thread Markus
I'm new here, so hi! ;-)

I'm looking to create a database
of persons and events, later to search persons by names, events by
dates and locations (participants of events are already in an
attribute of the event and instances of Person, which inherits from
Persistent)

At first I made a PersistentList of all the events
and a PersistentMapping of all the people by an id, but later found
out, that searching through a list with a for-loop is very slow
(there are about 200 000 people and 100 000 events). And so as I've
looked around here a bit (the docs and
the wikis are mostly outdated
or empty -- there's also talk about the bad documentation in this
mailinglist) I've found, that I should be using OOBTree for making
the indexes.

So what I'm asking is, is it reasonable to create
the db like this: persons in root['persons'],
which is a OOBTree,
mapping names to Person-objects and events in root['events'],
which also an OOBTree, mapping dates to Event-objects? And if I
want to map locations to events, I should do it at the same time,
when creating the events, so I don't have to loop through all of them
again?

If I have a OOBTree-mapping of dates to events, what
should the values of it be? PersistentLists? I've read something
about Buckets or Sets, but I'm not sure what they are good
for,
Bucket seems to behave like the equivalent BTree (OO, or IO or OI or
IF or ), but Set seems to be a set... Is that
true?

What's the difference between a PersistentMapping and a
OOBTree or OOBucket? Only the back-end, because on the front they
all seem like dictionarys? Should I be using OOBTrees and OOBuckets
for what I'm doing, because strings and dates are Os and not Is
or Fs or...

Thank you very much for any hints, tips, links
or help! ;-)

Markus___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev