On Wed, Dec 11, 2013 at 3:27 AM, Tamer Higazi <tamerito...@arcor.de> wrote:
> Perhaps it's my fault, the way I stored the data.
> The point is, that I am looking the fastest and performant way to grab the
> data from a big pool.

I do not think folks are suggesting you iterate over the "big pool"
keys/values/items.  Rather, folks are suggesting that you consider
using BTrees to make your own indexes, if you choose not to use
search/catalog/indexing tools like Hypatia or repoze.catalog (with or
without Souper).  For each field of data you care to search, you need
an index that can yield identifiers that match the keys in your "big
pool" of data (which I assume is also stored in a BTree).

I personally chose to solve this problem by building myself a small
container/retrieval framework uses a combination of BTrees, UUID
usage, zope.schema, and repoze.catalog to tackle problems (this
assumes that all my items are keyed by UUIDs).  Example:

> I don't want to iterate the whole thing over and over again to get the
> matched entries.
> If i would have 100.000 - 500.000 entries, where 150 would match, I don't
> want to go over those completely.

You need indexes, either use a built-in ZODB-native catalog/search
framework like repoze.catalog/hypatia (or Souper), external search
service like Solr/ElasticSearch, or build your own indexes using two
BTrees each (forward mapping of values to lists of matching ids,
reverse mapping of ids to values).

> So, if there is a pythonic way to get it solved on a performant way, or a
> ZODB way as well that would be wondefull.

You need indexes; most folks choose to use some sort of library to
solve this problem instead of DIY indexes, but that's up to you.

For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org

Reply via email to