> Anyone with experience, suggestions or lessons learned in the 10 -100 TB > scale they'd like to share? > Researching optimum design for a Solr Cloud with, say, about 20TB index.
We're building a web archive with a projected index size of 20TB (distributed in 20 shards). Some test results and a short write-up at http://sbdevel.wordpress.com/2013/12/06/danish-webscale/ - feel free to ask for more details. tl;dr: We're saying to hell with RAM for caching and putting it all on SSDs on a single big machine. Results so far (some distributed tests with 200GB & 400GB indexes, some single tests with a production-index of 1TB) are very promising, both for plain keyword-search, grouping and faceting (DocValues rocks). - Toke Eskildsen