Hello,

I have to figure out how much hardware is required to do clustering
for my company on about 10+ milion user accounts, each with 100-5000
documents. The documents will be indexed so vector creation will be
done at indexing.
Is there any formula to approximate the size of the vectors based on
the index size? I'm looking for rough estimates (how much disk extra
space should I consider?).

Which are the most time consuming tasks?  From my experience with
clustering, the index/vector creation part is the most time consuming,
while clustering being the second. Does anyone have some data on how
much time a clustering job takes?

Thanks,

-- 
Ioan Eugen Stan
http://ieugen.blogspot.com/

Reply via email to