bq: ..allocated by the core close for unloaded cores? And how about the processing time for unloaded cores to get it loaded first if we issue a query to it?
Well, all resources are supposed to be returned to the system. Even 500 cores open at one time is a lot though. My theory is this has nothing to do with transient or non-transient cores. What's happening here is that you simply are opening too many cores (eventually) for the memory you're allocating. Plus, various caches get filled up at different times depending on the query. Also, is you have, say, 1,000 simultaneous queries outstanding to 1,000 different cores, _all_ 1,000 will be loaded in memory at the same time (I'm simplifying a bit here). After 500 of the queries have been satisfied, the number should drop back. So here's what I'd do to test if there's really a memory leak or you're just being too ambitious: Drop the transient cache size to, say, 100 (or 50 or 10). You'll also have to take some care not to flood the system with lots of queries to lots of different cores, but you should vary the cores to cycle through them all. If your process still shows memory creeping, you'll need to take some memory snapshots so we can analyze what's going on. And by mixing very different numbers of documents in your various cores, you're introducing another variable that will make apples-to-apples comparisons difficult. The model the LotsOfCores stuff was built to deal with is having 100's to 1,000's of cores, but not very many of them active at once. Consider a situation where each e-mail user has their own core. A user searches old e-mails only very rarely, so having 10,000 cores on a machine, only, say, 10-20 may be active at once. You never know which ones, of course. Eventually all of them will be used but rarely very many simultaneously. So you may be hitting an edge case if you are continually firing queries at different cores. Loading a core is expensive, all the underlying caches will be warmed, firstSearcher queries will be fired, etc. And on only 8G of memory for 500 active cores, it's not surprising that you're blowing up memory IMO. Best, Erick On Thu, Oct 23, 2014 at 11:28 AM, Xiaolu Zhao <xiaolu.z...@oracle.com> wrote: > Hi Erick, > > Actually we are adding more cores. In this case, we set > "transientCacheSize=500", create 16,000 cores in total, each with 10k log > entries. > > During the process, we could easily see JVM memory usage will increase as > the total number of cores grows. It runs out of memory when the total number > of cores reaches 5,400. > > Then we restart Solr, continue creating and loading cores. JVM memory usage > will rise to over 7GB (Max: 8GB), but not exceed the maximum. The process > could be very slow then, we believe garbage collection may take place and > cost some time. > > How about the resources usage for LotsOfCores (loaded/unloaded), e.g. > searcher? Are all resources allocated by the core close for unloaded cores? > And how about the processing time for unloaded cores to get it loaded first > if we issue a query to it? > > We do the testing to look into the processing time for unloaded cores. In > this case, we have 100 cores, 1-50 with 100M, 51-55 with 1M, 56-60 with 10M, > 61-70 with 100K, 71-100 with 10K. Then we could do query to unloaded cores > with different data size to get the processing time for each group. Here, > this query is for all: "select?q=*". > > *Collection Name* > > > > *Total Time(ms)* > > > > *QTime(ms)* > > > > *Processing Time(ms)* > > collection71(10K) > > > > 418 > > > > 1 > > > > 417 > > collection72(10K) > > > > 413 > > > > 0 > > > > 413 > > collection61(100K) > > > > 439 > > > > 2 > > > > 437 > > collection62(100K) > > > > 424 > > > > 1 > > > > 423 > > collection51(1M) > > > > 527 > > > > 5 > > > > 522 > > collection52(1M) > > > > 538 > > > > 5 > > > > 533 > > collection56(10M) > > > > 560 > > > > 33 > > > > 527 > > collection57(10M) > > > > 553 > > > > 33 > > > > 520 > > collection3(100M) > > > > 5971 > > > > 322 > > > > 5649 > > collection4(100M) > > > > 6052 > > > > 327 > > > > 5725 > > > Based on the table above, we could see an ascending trend with larger data. > But there is a big gap between 10M and 100M. > > Thanks, > Xiaolu > > > On 10/23/2014 9:51 AM, Erick Erickson wrote: >> >> Memory should eventually be returned when a core is unloaded. There's >> a very small amount of overhead for keeping a list of all the cores >> and their locations, but this shouldn't increase with time unless >> you're adding more cores. >> >> Do note that the transient cache size is fixed, but may be exceeded. A >> core is held open when it gets reclaimed long enough to serve any >> outstanding requests, but it _should_ have the memory reclaimed >> eventually. >> >> Of course there's always the possibility of some memory being kept >> inadvertently, I'd consider that a bug so if you can define how this >> happens, perhaps with a test case that would be great. Dumping the >> memory would help see what's kept if anything actually is. >> >> Best, >> Erick >> >> On Wed, Oct 22, 2014 at 12:33 PM, Xiaolu Zhao <xiaolu.z...@oracle.com> >> wrote: >>> >>> Hi Erick, >>> >>> Thanks a lot for your explanation. >>> >>> Last time, when I try out LotsOfCores, I find JVM memory usage will >>> increase >>> as the total number of cores grows, though the transient cache size is >>> fixed. Finally, JVM will run out of memory when I have thousands of >>> cores. >>> Does it mean other currently unloaded cores will consume memory? Or >>> swapping >>> among loaded/unloaded cores will consume memory? >>> >>> Best, >>> Xiaolu >>> >>> On 10/22/2014 12:23 PM, Erick Erickson wrote: >>>> >>>> The difference here is that the LotsOfCores is intended to cache open >>>> cores and thus limit the number of currently loaded cores. However, >>>> cores not currently loaded are available for use; the next request >>>> that needs that core will cause it to be loaded (or reloaded). >>>> >>>> The admin/core/UNLOAD command, on the other hand, is designed to >>>> _permanently_ remove the core from Solr. Or at least have it become >>>> unavailable until another explicit admin/core command is executed to >>>> bring it back. There is nothing automatic about this. >>>> >>>> Another way of looking at it is that LotsOfCores is used in a >>>> situation where you don't know what requests are coming in, but you >>>> _can_ predict that not many will be used at once. So if I have 500 >>>> cores, and my expectation is that only 20 of them are used at once, >>>> there's no good in having the 480 other cores loaded all the time. >>>> When a query comes in for one of the currently-unloaded cores (call it >>>> core21), that core is loaded (perhaps displacing one of the >>>> currently-loaded cores) and the request is served. >>>> >>>> If core21 above had been unloaded with the core/admin command, then a >>>> request directed to it would return an error instead. >>>> >>>> Best, >>>> Erick >>>> >>>> On Wed, Oct 22, 2014 at 12:11 PM, Xiaolu Zhao <xiaolu.z...@oracle.com> >>>> wrote: >>>>> >>>>> Hi All, >>>>> >>>>> I am confused about the difference between unloading of cores with >>>>> LotsOfCores and unloading a core with CoreAdmin. >>>>> >>>>> From my understanding of LotsOfCores, if one core is removed from >>>>> transient >>>>> cache, it is pending to close, it means close all resources allocated >>>>> by >>>>> the >>>>> core if it is no longer in use, e.g. searcher, updateHandler... While >>>>> for >>>>> unloading a core with CoreAdmin, this core needs to be removed from the >>>>> cores list, either ordinary cores list or transient cores list, and >>>>> cores >>>>> locator will delete it. If this core is loaded but not pending to >>>>> close, >>>>> it >>>>> will be close. >>>>> >>>>> Also, one more interesting thing is if I unload a core with CoreAdmin, >>>>> "core.properties" will be renamed "core.properties.unloaded". Then this >>>>> core >>>>> cannot be found in the Solr API, and STATUS url won't return its status >>>>> as >>>>> well. But with LotsOfCores, a core not in the transient cache will >>>>> still >>>>> have "core.properties" and could be found through STATUS url, though it >>>>> is >>>>> marked with "isLoaded=false". >>>>> >>>>> Could anyone tell me the underlying mechanism for these two cases? Why >>>>> LotsOfCores could realize frequent unloading/loading of cores? Do cores >>>>> not >>>>> in the transient cores still consume JVM memory, while unloaded cores >>>>> with >>>>> CoreAdmin not? >>>>> >>>>> Thanks, >>>>> Xiaolu >>> >>> >