Thanks all. I've the same index with a bit different schema and 200M documents, installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size of index is about 1.5TB, have many updates every 5 minutes, complex queries and faceting with response time of 100ms that is acceptable for us.
Toke Eskildsen, Is the index updated while you are searching? *No* Do you do any faceting or other heavy processing as part of a search? *No* How many hits does a search typically have and how many documents are returned? *The test for QTime only with no documents returned and No. of hits varying from 50,000 to 50,000,000.* How many concurrent searches do you need to support? How fast should the response time be? *May be 100 concurrent searches with 100ms with facets.* Does splitting the shard to two shards on the same node so every shard will be on a single EBS Volume better than using LVM? Thanks On Mon, Dec 29, 2014 at 2:00 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > Mahmoud Almokadem [prog.mahm...@gmail.com] wrote: > > We've installed a cluster of one collection of 350M documents on 3 > > r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is > > about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS > > General purpose (1x1TB + 1x500GB) on each instance. Then we create > logical > > volume using LVM of 1.5TB to fit our index. > > Your search speed will be limited by the slowest storage in your group, > which would be your 500GB EBS. The General Purpose SSD option means (as far > as I can read at http://aws.amazon.com/ebs/details/#piops) that your > baseline of 3 IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately > they do not say anything about latency. > > For comparison, I checked the system logs from a local test with our 21TB > / 7 billion documents index. It used ~27,000 IOPS during the test, with > mean search time a bit below 1 second. That was with ~100GB RAM for disk > cache, which is about ½% of index size. The test was with simple term > queries (1-3 terms) and some faceting. Back of the envelope: 27,000 IOPS > for 21TB is ~1300 IOPS/TB. Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400 > IOPS. > > All else being equal (which is never the case), getting 1-3 second > response times for a 1.1TB index, when one link in the storage chain is > capped at a few thousand IOPS, you are using networked storage and you have > little RAM for caching, does not seem unrealistic. If possible, you could > try temporarily boosting performance of the EBS, to see if raw IO is the > bottleneck. > > > The response time is about 1 and 3 seconds for simple queries (1 token). > > Is the index updated while you are searching? > Do you do any faceting or other heavy processing as part of a search? > How many hits does a search typically have and how many documents are > returned? > How many concurrent searches do you need to support? How fast should the > response time be? > > - Toke Eskildsen >