Re: Handling large datastore search

Dave Angel Tue, 03 Nov 2009 18:17:52 -0800

Ahmed Barakat wrote:

In case I have a  huge datastore (10000 entries, each entry has like 6
properties), what is the best way
to handle the search within such a huge datastore, and what if I want to
make a generic search, for example
you write a word and i use it to search within all properties I have for all
entries?


Is the conversion to XML a good solution, or it is not?

sorry for being new to web development, and python.

Thanks in advance.

I don't see anything about your query which is specific to webdevelopment, and there's no need to be apologetic for being new anyway.

One person's "huge" is another person's "pretty large." I'd say 10000items is pretty small if you're working on the desktop, as you canreadily hold all the data in "memory." I edit text files bigger thanthat. But I'll assume your data really is huge, or will grow to behuge, or is an environment which treats it as huge.

When you're parsing large amounts of data, there are always tradeoffsbetween performance and other characteristics, usually size andcomplexity. If you have lots of data, you're probably best off by usinga standard code system -- a real database. The developers of suchthings have decades of experience in making certain things fast,reliable, and self-consistent.

But considering only speed here, I have to point out that you have tounderstand databases, and your particular model of database, pretty wellto really benefit from all the performance tricks in there. Keeping itabstract, you specify what parts of the data you care about fast randomaccess to. If you want fast search access to "all" of it, your databasewill generally be huge, and very slow to updates. And the best way toavoid that is to pick a database mechanism that best fits your searchmechanism. I hate to think how many man-centuries Google has dedicatedto getting fast random word access to its *enormous* database. I'm surethey did not build on a standard relational model.

If you plan to do it yourself, I'd say the last thing you want to do isuse XML. XML may be convenient way to store self-describing data, butit's not quick to parse large amounts of it. Instead, store the rawdata in text form, with separate index files describing what is where.Anything that's indexed will be found rapidly, while anything that isn'twill require search of the raw data.

There are algorithms for searching raw data that are faster thanscanning every byte, but a relevant index will almost always be faster.


DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Re: Handling large datastore search

Reply via email to