Dan Stromberg wrote:
> I've been putting a little bit of time into a file indexing engine
[...]
To solve the O.P.'s first problem, the facility we need is an
efficient externally-stored multimap. A multimap is like a map,
except that each key is associated with a collection of values,
not just a s
Dan Stromberg:
>Rene Pijlman:
>> Right. My second attempt would be: a BTree with the word as key, and a
>> BTree of filenames as value
>Would ZODB let me do that?
Yes.
>I'm puzzled, because:
d1={}
d={}
d[d1] = ''
>TypeError: dict objects are unhashable
This is using a dict as _ke
On Fri, 17 Feb 2006 12:32:52 +0100, Rene Pijlman wrote:
> Dan Stromberg:
>>> My first attempt would be: a BTree with the word as key, and a 'list of
>>> filenames' as value.
>>> http://www.zope.org/Wikis/ZODB/FrontPage/guide/node6.html#SECTION00063
>>
>>This is basically what I'm d
Dan Stromberg:
>> My first attempt would be: a BTree with the word as key, and a 'list of
>> filenames' as value.
>> http://www.zope.org/Wikis/ZODB/FrontPage/guide/node6.html#SECTION00063
>
>This is basically what I'm doing now,
Right. My second attempt would be: a BTree with the
Dan Stromberg wrote:
> Bryan Olson wrote:
[...]
>> Well, you could use simple files instead of fancy database tables.
>
> That's an interesting thought. Perhaps especially if australopithecine
> were saved in a filename like:
>
> ~/indices/au/st/ra/lo/pi/th/ec/in/e
Right, though the better fi
On Thu, 16 Feb 2006 10:09:42 +0100, Rene Pijlman wrote:
> Dan Stromberg:
>>is there a python database interface that would allow me to define a
>>-lot- of tables? Like, each word becomes a table, and then the fields
>>in that table are just the filenames that contained that word.
>
> Give ZODB
About indexes everywhere: Yes, you don't have to be a DB expert to know
that indexes everywhere is bad. But look at this example. There are
really two ways that the data is going to get accessed in regular use.
Either they are going to ask for all files that have a word (most
likely) or they are go
About the filename ID - word ID table: Any good database (good with
large amounts of data) will handle the memory management for you. If
you get enough data, it may make sense to get bothered with PostgreSQL.
That has a pretty good record on handling very large sets of data, and
intermediate sets a
On Wed, 15 Feb 2006 23:37:31 -0800, Jonathan Gardner wrote:
> I'm no expert in BDBs, but I have spent a fair amount of time working
> with PostgreSQL and Oracle. It sounds like you need to put some
> optimization into your algorithm and data representation.
>
> I would do pretty much like you are
On Thu, 16 Feb 2006 13:45:28 +, Bryan Olson wrote:
> Dan Stromberg wrote:
>> I've been putting a little bit of time into a file indexing engine
> [...]
>
>> So far, I've been taking the approach of using a single-table database
>> like gdbm or dbhash [...] and making each entry keyed by
>> a
Dan Stromberg wrote:
> I've been putting a little bit of time into a file indexing engine
[...]
> So far, I've been taking the approach of using a single-table database
> like gdbm or dbhash [...] and making each entry keyed by
> a word, and under the word in the database is a null terminated list
Jonathan Gardner wrote:
> I'm no expert in BDBs, but I have spent a fair amount of time working
> with PostgreSQL and Oracle. It sounds like you need to put some
> optimization into your algorithm and data representation.
>
> I would do pretty much like you are doing, except I would only have the
Dan Stromberg:
>is there a python database interface that would allow me to define a
>-lot- of tables? Like, each word becomes a table, and then the fields
>in that table are just the filenames that contained that word.
Give ZODB a try.
http://www.zope.org/Wikis/ZODB/FrontPage
http://www.pytho
I'm no expert in BDBs, but I have spent a fair amount of time working
with PostgreSQL and Oracle. It sounds like you need to put some
optimization into your algorithm and data representation.
I would do pretty much like you are doing, except I would only have the
following relations:
- word to wo
14 matches
Mail list logo