Jonathan Gardner wrote: > I'm no expert in BDBs, but I have spent a fair amount of time working > with PostgreSQL and Oracle. It sounds like you need to put some > optimization into your algorithm and data representation. > > I would do pretty much like you are doing, except I would only have the > following relations: > > - word to word ID > - filename to filename ID > - word ID to filename ID > > You're going to want an index on pretty much every column in this > database.
stop ! I'm not a db expert neither, but putting indexes everywhere is well known DB antipattern. An index is only useful if the indexed field is discriminant enough (ie: there must be the less possible records having the same value for this field). Else, the indexed lookup may end up taking far more time than a simple linear lookup. Also, indexes slow down write operations. > That's because you're going to lookup by any one of these > columns for the corresponding value. > > I said I wasn't an expert in BDBs. But I do have some experience > building up large databases. In the first stage, you just accumulate > the data. Then you build the indexes only as you need them. Yes. And only where it makes sens. (snip) > And your idea of hundreds of thousands of tables? Very bad. Don't do > it. +100 on this -- bruno desthuilliers python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in '[EMAIL PROTECTED]'.split('@')])" -- http://mail.python.org/mailman/listinfo/python-list