I think it is not only database that matters here. It is the clustering
technique they use. If you have a distributed array of computers and a
fast internet connection, you can build a huge database in one or two
days. The real problem is when a user types in a certain topic of
interest, how do you decide what the user actually wants. This now boils
down to how you cluster your database such that you can select which
particular sites the user want.  Designing an algorithm for clustering or
grouping of large datasets is normally extremely difficult and
computationally intensive because it is generally multi-dimensional (just
imagine computing the derivatives, partial derivatives and nth power of a
1000 X 1000 matrix) and you have to know who the user is as much as
possible (in terms of gender, nationality, age, education, job, etc.) .
This is why google as far as I know requires beowulf computing just to
cluster their database quickly. They use python not perl though :)

rowel

On Mon, 29 Apr 2002, fooler wrote:

> as of the moment yes because google has a big cache storage or database of
> entire web on the net... let us see when teoma reaches that :->
>
> fooler.
>

_
Philippine Linux Users Group. Web site and archives at http://plug.linux.org.ph
To leave: send "unsubscribe" in the body to [EMAIL PROTECTED]

To subscribe to the Linux Newbies' List: send "subscribe" in the body to 
[EMAIL PROTECTED]

Reply via email to