Thanks, Shawn. This information is actually not all that shocking to me. It's always been in the back of my mind that I was "getting away with something" in serving from the m1.large. Remarkably, however, it has served me well for nearly two years; also, although the index has not always been 30GB, it has always been much larger than the RAM on the box. As you suggested, I can only suppose that usage patterns and the index schema have in some way facilitated minimal heap usage, up to this point.
For now, we're going to increase the heap size on the instance and see where that gets us; if it still doesn't suffice for now, then we'll upgrade to a more powerful instance. Michael, thanks for weighing in. Those i2 instances look delicious indeed. Just curious -- have you struggled with garbage collection pausing at all? On Thu, Jan 30, 2014 at 7:43 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 1/30/2014 3:20 PM, Joseph Hagerty wrote: > >> I'm using Solr 3.5 over Tomcat 6. My index has reached 30G. >> > > <snip> > > > - The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM >> > > One detail that you did not provide was how much of your 7.5GB RAM you are > allocating to the Java heap for Solr, but I actually don't think I need > that information, because for your index size, you simply don't have > enough. If you're sticking with Amazon, you'll want one of the instances > with at least 30GB of RAM, and you might want to consider more memory than > that. > > An ideal RAM size for Solr is equal to the size of on-disk data plus the > heap space used by Solr and other programs. This means that if your java > heap for Solr is 4GB and there are no other significant programs running on > the same server, you'd want a minimum of 34GB of RAM for an ideal setup > with your index. 4GB of that would be for Solr itself, the remainder would > be for the operating system to fully cache your index in the OS disk cache. > > Depending on your query patterns and how your schema is arranged, you > *might* be able to get away as little as half of your index size just for > the OS disk cache, but it's better to make it big enough for the whole > index, plus room for growth. > > http://wiki.apache.org/solr/SolrPerformanceProblems > > Many people are *shocked* when they are told this information, but if you > think about the relative speeds of getting a chunk of data from a hard disk > vs. getting the same information from memory, it's not all that shocking. > > Thanks, > Shawn > > -- - Joe