On Fri, 2009-03-13 at 19:42 -0700, buddha1021 wrote: > hi dennis: ... > I am confident that hadoop can process the large datas of the www search > engine! But lucene? I am afraid of the limited size of lucene's index per > server is very little ,10G? or 30G? this is not enough for the www search > engine! IMO, this is a bottleneck!
I agree that the actual problem/solution of accessing lucene indexes is to keep them small. What does the possibility of having a clouded index serve if accessing it takes hours? For me here should lie one of nutch core competences: making search in BIG indexes fast (as fast as in SMALL indexes). salu2 -- Thorsten Scherler <thorsten.at.apache.org> Open Source <consulting, training and solutions>