Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On 7/2/2011 12:34 PM, Yonik Seeley wrote: OK, I tried a quick test of 1.4.1 vs 3x on optimized indexes (unoptimized had different numbers of segments so I didn't try that). 3x (as of today) was 28% faster at a large filter query (300 terms in one big disjunction, with each term matching ~1000 docs). A lot of the terms used in my filter queries may match hundreds of thousands or even millions of documents. The largest search group (sg:stdp) matches about 1.4 million out of 9.5 million docs on each shard, and is probably present in most filter queries. Right now I have the default termIndexInterval of 128, and a setTermIndexDivisor of 8. I think this probably has the same memory footprint as a termIndexInterval of 1024, but because it can do seeks in the tii file (taking good advantage of disk cache) before it ultimately seeks in the tis file, there are probably fewer seeks. My warm time is slightly better than it was with the interval at 1024, and my average query speed hasn't changed much. I am going to try an interval of 64 and a divisor of 16. I'm interested in other performance enhancing ideas that don't involve tweaking tons of options all at the same time. I think my best bet for performance is adding more memory, of course. Shawn
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
OK, I tried a quick test of 1.4.1 vs 3x on optimized indexes (unoptimized had different numbers of segments so I didn't try that). 3x (as of today) was 28% faster at a large filter query (300 terms in one big disjunction, with each term matching ~1000 docs). -Yonik http://www.lucidimagination.com On Thu, Jun 30, 2011 at 3:30 PM, Shawn Heisey wrote: > On 6/29/2011 10:16 PM, Shawn Heisey wrote: >> >> I was thinking perhaps I might actually decrease the termIndexInterval >> value below the default of 128. I know from reading the Hathi Trust blog >> that memory usage for the tii file is much more than the size of the file >> would indicate, but if I increase it from 13MB to 26MB, it probably would >> still be OK. > > Decreasing the termIndexInterval to 64 almost doubled the tii file size, as > expected. It made the filterCache warming much faster, but made the > queryResultCache warming very very slow. Regular queries also seem like > they're slower. > > I am trying again with 256. I may go back to the default before I'm done. > I'm guessing that a lot of trial and error was put into choosing the > default value. > > It's been fun having a newer index available on my backup servers. I've > been able to do a lot of trials, learned a lot of things that don't work and > a few that do. I might do some experiments with trunk once I've moved off > 1.4.1. > > Thanks, > Shawn > >
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On 6/29/2011 10:16 PM, Shawn Heisey wrote: I was thinking perhaps I might actually decrease the termIndexInterval value below the default of 128. I know from reading the Hathi Trust blog that memory usage for the tii file is much more than the size of the file would indicate, but if I increase it from 13MB to 26MB, it probably would still be OK. Decreasing the termIndexInterval to 64 almost doubled the tii file size, as expected. It made the filterCache warming much faster, but made the queryResultCache warming very very slow. Regular queries also seem like they're slower. I am trying again with 256. I may go back to the default before I'm done. I'm guessing that a lot of trial and error was put into choosing the default value. It's been fun having a newer index available on my backup servers. I've been able to do a lot of trials, learned a lot of things that don't work and a few that do. I might do some experiments with trunk once I've moved off 1.4.1. Thanks, Shawn
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On 6/29/2011 7:50 PM, Yonik Seeley wrote: OK, your filter queries have hundreds of terms in them (and that means hundreds of term lookups, which uses the term index). Thus, your termIndexInterval change is be the leading suspect for the slowdown. A termIndexInterval of 1024 means that a term lookup will seek to the closest 1024th term and then call next() until the desired term is found. Hence instead of calling next() an average of 64 times internally, it's now 512 times. Of course there is still a mystery about why your tii (which is the term index) would be so much bigger instead of smaller... It turns out I got the two indexes backwards, the smaller one was the new index. I may have mixed up the indexes on some of the other files too, but they weren't much different, so I'm not going to try and figure out where any mistakes might be. Earlier in the afternoon I figured this out, removed termIndexInterval from my config, and rebuilt the index. I had originally put this in to speed up indexing. The evidence I had available at the time told me that this goal was accomplished, but the rebuild actually went faster without the statement. Warming times are now averaging under 10 seconds even with the warmup count back up to 8. This is still slower than I would like, but it is a major improvement. Even more important, I understand what happened. I was thinking perhaps I might actually decrease the termIndexInterval value below the default of 128. I know from reading the Hathi Trust blog that memory usage for the tii file is much more than the size of the file would indicate, but if I increase it from 13MB to 26MB, it probably would still be OK. Are any index intervals for the other Lucene files configurable in a similar manner? I know that screwing too much with the defaults can make things much worse, so I would be very careful with any adjustments, and try to fully understand why any performance gain or loss occurred. Thanks, Shawn
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On Wed, Jun 29, 2011 at 3:28 PM, Yonik Seeley wrote: > > On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey wrote: > > Just now, three of the six shards had documents deleted, and they took > > 29.07, 27.57, and 28.66 seconds to warm. The 1.4.1 counterpart to the 29.07 > > second one only took 4.78 seconds, and it did twice as many autowarm > > queries. > > Can you post the logs at the INFO level that covers the warming period? OK, your filter queries have hundreds of terms in them (and that means hundreds of term lookups, which uses the term index). Thus, your termIndexInterval change is be the leading suspect for the slowdown. A termIndexInterval of 1024 means that a term lookup will seek to the closest 1024th term and then call next() until the desired term is found. Hence instead of calling next() an average of 64 times internally, it's now 512 times. Of course there is still a mystery about why your tii (which is the term index) would be so much bigger instead of smaller... -Yonik http://www.lucidimagination.com
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey wrote: > Just now, three of the six shards had documents deleted, and they took > 29.07, 27.57, and 28.66 seconds to warm. The 1.4.1 counterpart to the 29.07 > second one only took 4.78 seconds, and it did twice as many autowarm > queries. Can you post the logs at the INFO level that covers the warming period? -Yonik http://www.lucidimagination.com
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On 6/29/2011 11:27 AM, Shawn Heisey wrote: On 6/29/2011 9:17 AM, Yonik Seeley wrote: Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2 and then run some of the queries to see if you can figure out which are slower? Do any of the queries have stopwords in fields where you now index those? If so, that could entirely account for the difference. The query cache warms very quickly, it's the filter cache that's taking forever. I'm not intimately familiar with what is being put in our filter queries by our webapp, but I'd be a little surprised if there are stopwords there. A quick grep through solr logs (when I've turned it up to INFO) for the really common ones didn't reveal any. People do type them in fairly frequently, but they go into q= ... fq values are constructed internally, not from what a user types, and as far as I know, they involve fields that have never had stopwords removed. I should add that this happens only after the index has had at least a few hundred queries, when deletes are committed. The delete process runs every ten minutes, and checks for document presence before issuing the delete, which avoids unnecessary commits. Just now, three of the six shards had documents deleted, and they took 29.07, 27.57, and 28.66 seconds to warm. The 1.4.1 counterpart to the 29.07 second one only took 4.78 seconds, and it did twice as many autowarm queries. I know it's not my single *:* sorted warming query (firstSearcher and newSearcher), because on solr startup with either version, warm time is 0.01 seconds. I have useColdSearcher set to false. Thanks, Shawn
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On 6/29/2011 9:17 AM, Yonik Seeley wrote: Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2 and then run some of the queries to see if you can figure out which are slower? Do any of the queries have stopwords in fields where you now index those? If so, that could entirely account for the difference. The query cache warms very quickly, it's the filter cache that's taking forever. I'm not intimately familiar with what is being put in our filter queries by our webapp, but I'd be a little surprised if there are stopwords there. A quick grep through solr logs (when I've turned it up to INFO) for the really common ones didn't reveal any. People do type them in fairly frequently, but they go into q= ... fq values are constructed internally, not from what a user types, and as far as I know, they involve fields that have never had stopwords removed. I will do some experimentation with your suggestions. Thanks, Shawn
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2 and then run some of the queries to see if you can figure out which are slower? Do any of the queries have stopwords in fields where you now index those? If so, that could entirely account for the difference. -Yonik http://www.lucidimagination.com On Wed, Jun 29, 2011 at 10:59 AM, Shawn Heisey wrote: > I have noticed a significant difference in filter cache warming times on my > shards between 3.2 and 1.4.1. What can I do to troubleshoot this? Please > let me know what additional information you might need to look deeper. I > know this isn't enough. > > It takes about 3 seconds to do an autowarm count of 8 on 1.4.1 and 10-15 > seconds to do an autowarm count of 4 on 3.2. The only explicit warming > query is *:*, sorted descending by post_date, a tlong field containing a > UNIX timestamp, precisionStep 16. The indexes are not entirely identical, > but the new one did evolve from the old one. Perhaps one of the experts > might spot something that makes for much slower filter cache warming, or > some way to look deeper if this seems wrong? Is there a way to see the > search URL bits that populated the cache? > > Index differences: The new index has four extra small fields, is no longer > removing stopwords, and has omitTermFreqAndPositions enabled on a > significant number of fields. Most of the fields are tokenized text, and > now more than half of those don't have tf and tp enabled. Naturally the > largest text field where most of the matches happen still does have them > enabled. > > To increase reindex speed, the new index has a termIndexInterval of 1024, > the old one is at the default of 128. In terms of raw size, the new index > is less than one percent larger than the old one. The old shards average > out to 17.22GB, the new ones to 17.41GB. Here's an overview of the > differences of each type of file (comparing the huge optimized segment only, > not the handful of tiny ones since) on one the index with the largest size > gap, old value listed first: > > fdt: 6317180127/6055634923 (4.1% decrease) > fdx: 76447972/75647412 (1% decrease) > fnm: 382, 338 (44 bytes! woohoo!) > frq: 2828400926/2873249038 (1.5% increase) > nrm: 28367782/38223988 (35% increase) > prx: 2449154203/2684249069 (9.5% increase) > tii: 1686298/13329832 (790% increase) > tis: 923045932/999294109 (8% increase) > tvd: 18910972/19111840 (1% increase) > tvf: 5867309063/5640332282 (3.9% decrease) > tvx: 151294820/152895940 (1% increase) > > The tii and nrm files are the only ones that saw a significant size > increase, but the tii file is MUCH bigger. > > Thanks, > Shawn > >
Solr 3.2 filter cache warming taking longer than 1.4.1
I have noticed a significant difference in filter cache warming times on my shards between 3.2 and 1.4.1. What can I do to troubleshoot this? Please let me know what additional information you might need to look deeper. I know this isn't enough. It takes about 3 seconds to do an autowarm count of 8 on 1.4.1 and 10-15 seconds to do an autowarm count of 4 on 3.2. The only explicit warming query is *:*, sorted descending by post_date, a tlong field containing a UNIX timestamp, precisionStep 16. The indexes are not entirely identical, but the new one did evolve from the old one. Perhaps one of the experts might spot something that makes for much slower filter cache warming, or some way to look deeper if this seems wrong? Is there a way to see the search URL bits that populated the cache? Index differences: The new index has four extra small fields, is no longer removing stopwords, and has omitTermFreqAndPositions enabled on a significant number of fields. Most of the fields are tokenized text, and now more than half of those don't have tf and tp enabled. Naturally the largest text field where most of the matches happen still does have them enabled. To increase reindex speed, the new index has a termIndexInterval of 1024, the old one is at the default of 128. In terms of raw size, the new index is less than one percent larger than the old one. The old shards average out to 17.22GB, the new ones to 17.41GB. Here's an overview of the differences of each type of file (comparing the huge optimized segment only, not the handful of tiny ones since) on one the index with the largest size gap, old value listed first: fdt: 6317180127/6055634923 (4.1% decrease) fdx: 76447972/75647412 (1% decrease) fnm: 382, 338 (44 bytes! woohoo!) frq: 2828400926/2873249038 (1.5% increase) nrm: 28367782/38223988 (35% increase) prx: 2449154203/2684249069 (9.5% increase) tii: 1686298/13329832 (790% increase) tis: 923045932/999294109 (8% increase) tvd: 18910972/19111840 (1% increase) tvf: 5867309063/5640332282 (3.9% decrease) tvx: 151294820/152895940 (1% increase) The tii and nrm files are the only ones that saw a significant size increase, but the tii file is MUCH bigger. Thanks, Shawn