Hello, I have to figure out how much hardware is required to do clustering for my company on about 10+ milion user accounts, each with 100-5000 documents. The documents will be indexed so vector creation will be done at indexing. Is there any formula to approximate the size of the vectors based on the index size? I'm looking for rough estimates (how much disk extra space should I consider?).
Which are the most time consuming tasks? From my experience with clustering, the index/vector creation part is the most time consuming, while clustering being the second. Does anyone have some data on how much time a clustering job takes? Thanks, -- Ioan Eugen Stan http://ieugen.blogspot.com/
