Dominique Lederer wrote:
Christian Theune wrote:
Am Samstag, den 05.05.2007, 17:42 +0200 schrieb Dominique Lederer:

i would like to retrieve a number of *random* entries out of a catalogs field 

i tried it with first getting the catalogindex-length an then accessing a
randomized list-index, but this is very slow, because of the large number of
entries in the index.

do you know any better solution?
I'm kind of guessing here.
You say you are:

- querying the catalog
- accessing a random index from the result set
- noticing that this is slow

Does this only happen if the index is very large, e.g. you're retrieving
an element from the end of the result set?

I don't know exactly how the result sets are organized, but this
behaviour would imply that loading a later element triggers something
like loading the earlier elements too. I can't really imagine that.

I think the general problem that this is slow lies in the fact that
randomly selecting elements means
a) you need access to the full list of things
b) applying a sort
Sorting has a complexity of at least O(n log n) which becomes slow
enough for large sets that it's noticable.

BTW: How large is large?


hi, thanks for the reply, i just managed to improve the performance of my query

what i wanted to do was:

- retrieve the len() of the catalog index
- retrieve a list() of the Resultset
- accessing n random results and their objects

to retrieve a random object i did:

query = catalog.apply({'myIndex':(None,None)})
length = len(query)
index_intids = list(query)
intid = all[random.randint(0,len_all-1)]
object = getObject(intid)

which was with 10000 items in the index slow (i had to wait 2-3 seconds for a
view to render)

after looking into the field index implementation i changed the above lines to:

length = len(catalog['myIndex']._rev_index)

If you are using FieldIndex use

length = catalog['myIndex'].documentCount()

The FieldIndex holds a counter with the number of entries in the _rev_index.

index_intids = list(catalog['myIndex']._rev_index.keys())

which now works like a charm.

i am not an expert with BTrees so i cant really say what the problem is/was.

len on a btree is slow because it needs to iterate over all keys to count them!

If possible always avoid using the catalog and use the index directly, it is much faster!


Zope3-users mailing list

Reply via email to