Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
HI Shawn, Thanks for your reply. The memory setting of my Solr box is 12G physically memory. 4G for java (-Xmx4096m) The index size is around 4G in Solr 4.9, I think it was over 6G in Solr 4.0. I do think the RAM size of java is one of the reasons for this slowness. I'm doing one big commit and when the ingestion process finished 50%, I can see the solr server already used over 90% of full memory. I'll try to assign more RAM to Solr Java. But from your experience, does 4G sounds like a good number for Java heap size for my scenario? Is there any way to reduce memory usage during index time? (One thing I know is do a few commits instead of one commit. ) My concern is providing I have 12 G in total, If I assign too much to Solr server, I may not have enough for the OS to cache Solr index file. I had a look to solr config file, but couldn't find anything that obviously wrong, Just wondering which part of that config file would impact the index time? Thanks, Ryan One possible source of problems with that particular upgrade is the fact that stored field compression was added in 4.1, and termvector compression was added in 4.2. They are on by default and cannot be turned off. The compression is typically fast, but with very large documents like yours, it might result in pretty major computational overhead. It can also require additional java heap, which ties into what follows: Another problem might be RAM-related. If your java heap is very large, or just a little bit too small, there can be major performance issues from garbage collection. Based on the fact that the earlier version performed well, a too-small heap is more likely than a very large heap. If your index size is such that it can't be effectively cached by the amount of total RAM on the machine (minus the java heap assigned to Solr), that can cause performance problems. Your index size is likely to be several gigabytes, and might even reach double-digit gigabytes. Can you relate those numbers -- index size, java heap size, and total system RAM? If you can, it would also be a good idea to share your solrconfig.xml. Here's a wiki page that goes into more detail about possible performance issues. It doesn't mention the possible compression problem: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
RE: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
Hi Erick, As Ryan Ernst noticed, those big fields (eg majorTextSignalStem) is not stored. There are a few stored fields in my schema, but they are very small fields basically name or id for that document. I tried turn them off(only store id filed) and that didn't make any difference. Thanks, Ryan Ryan: As it happens, there's a discssion on the dev list about this. If at all possible, could you try a brief experiment? Turn off all the storage, i.e. set stored=false on all fields. It's a lot to ask, but it'd help the discussion. Or join the discussion at https://issues.apache.org/jira/browse/LUCENE-5914. Best, Erick From: Li, Ryan Sent: Friday, September 05, 2014 3:28 PM To: solr-user@lucene.apache.org Subject: Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9 HI Shawn, Thanks for your reply. The memory setting of my Solr box is 12G physically memory. 4G for java (-Xmx4096m) The index size is around 4G in Solr 4.9, I think it was over 6G in Solr 4.0. I do think the RAM size of java is one of the reasons for this slowness. I'm doing one big commit and when the ingestion process finished 50%, I can see the solr server already used over 90% of full memory. I'll try to assign more RAM to Solr Java. But from your experience, does 4G sounds like a good number for Java heap size for my scenario? Is there any way to reduce memory usage during index time? (One thing I know is do a few commits instead of one commit. ) My concern is providing I have 12 G in total, If I assign too much to Solr server, I may not have enough for the OS to cache Solr index file. I had a look to solr config file, but couldn't find anything that obviously wrong, Just wondering which part of that config file would impact the index time? Thanks, Ryan One possible source of problems with that particular upgrade is the fact that stored field compression was added in 4.1, and termvector compression was added in 4.2. They are on by default and cannot be turned off. The compression is typically fast, but with very large documents like yours, it might result in pretty major computational overhead. It can also require additional java heap, which ties into what follows: Another problem might be RAM-related. If your java heap is very large, or just a little bit too small, there can be major performance issues from garbage collection. Based on the fact that the earlier version performed well, a too-small heap is more likely than a very large heap. If your index size is such that it can't be effectively cached by the amount of total RAM on the machine (minus the java heap assigned to Solr), that can cause performance problems. Your index size is likely to be several gigabytes, and might even reach double-digit gigabytes. Can you relate those numbers -- index size, java heap size, and total system RAM? If you can, it would also be a good idea to share your solrconfig.xml. Here's a wiki page that goes into more detail about possible performance issues. It doesn't mention the possible compression problem: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
Hi Guys, Just some update. I've tried with Solr 4.10 (same code for Solr 4.9). And that has the same index speed as 4.0. The only problem left now is that Solr 4.10 takes more memory than 4.0 so I'm trying to figure out what is the best number for Java heap size. I think that proves there is some performance issue with Solr 4.9 when index big document (even just over 1mb). Thanks, Ryan
Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
Why do one big commit? You could do hard commits along the way but keep searcher open and not see the changes until the end. Obviously a separate issue from memory consumption discussion, but thought I'll add it anyway. Regards, Alex On 05/09/2014 3:30 am, Li, Ryan ryan...@sensis.com.au wrote: HI Shawn, Thanks for your reply. The memory setting of my Solr box is 12G physically memory. 4G for java (-Xmx4096m) The index size is around 4G in Solr 4.9, I think it was over 6G in Solr 4.0. I do think the RAM size of java is one of the reasons for this slowness. I'm doing one big commit and when the ingestion process finished 50%, I can see the solr server already used over 90% of full memory. I'll try to assign more RAM to Solr Java. But from your experience, does 4G sounds like a good number for Java heap size for my scenario? Is there any way to reduce memory usage during index time? (One thing I know is do a few commits instead of one commit. ) My concern is providing I have 12 G in total, If I assign too much to Solr server, I may not have enough for the OS to cache Solr index file. I had a look to solr config file, but couldn't find anything that obviously wrong, Just wondering which part of that config file would impact the index time? Thanks, Ryan One possible source of problems with that particular upgrade is the fact that stored field compression was added in 4.1, and termvector compression was added in 4.2. They are on by default and cannot be turned off. The compression is typically fast, but with very large documents like yours, it might result in pretty major computational overhead. It can also require additional java heap, which ties into what follows: Another problem might be RAM-related. If your java heap is very large, or just a little bit too small, there can be major performance issues from garbage collection. Based on the fact that the earlier version performed well, a too-small heap is more likely than a very large heap. If your index size is such that it can't be effectively cached by the amount of total RAM on the machine (minus the java heap assigned to Solr), that can cause performance problems. Your index size is likely to be several gigabytes, and might even reach double-digit gigabytes. Can you relate those numbers -- index size, java heap size, and total system RAM? If you can, it would also be a good idea to share your solrconfig.xml. Here's a wiki page that goes into more detail about possible performance issues. It doesn't mention the possible compression problem: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
On Fri, Sep 5, 2014 at 3:22 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Why do one big commit? You could do hard commits along the way but keep searcher open and not see the changes until the end. Alexandre, I don't think it's can happen in solr-user list, next search pickups the new searcher. Ryan, Regularly, commit is judged by application requirement, ie. when to make updates visible. Memory consumption is judged by ramBufferSizeMB and maxIndexingThreads. Exceeding the buffer, causes flush to disk, but doesn't trigger commit. Obviously a separate issue from memory consumption discussion, but thought I'll add it anyway. Regards, Alex On 05/09/2014 3:30 am, Li, Ryan ryan...@sensis.com.au wrote: HI Shawn, Thanks for your reply. The memory setting of my Solr box is 12G physically memory. 4G for java (-Xmx4096m) The index size is around 4G in Solr 4.9, I think it was over 6G in Solr 4.0. I do think the RAM size of java is one of the reasons for this slowness. I'm doing one big commit and when the ingestion process finished 50%, I can see the solr server already used over 90% of full memory. I'll try to assign more RAM to Solr Java. But from your experience, does 4G sounds like a good number for Java heap size for my scenario? Is there any way to reduce memory usage during index time? (One thing I know is do a few commits instead of one commit. ) My concern is providing I have 12 G in total, If I assign too much to Solr server, I may not have enough for the OS to cache Solr index file. I had a look to solr config file, but couldn't find anything that obviously wrong, Just wondering which part of that config file would impact the index time? Thanks, Ryan One possible source of problems with that particular upgrade is the fact that stored field compression was added in 4.1, and termvector compression was added in 4.2. They are on by default and cannot be turned off. The compression is typically fast, but with very large documents like yours, it might result in pretty major computational overhead. It can also require additional java heap, which ties into what follows: Another problem might be RAM-related. If your java heap is very large, or just a little bit too small, there can be major performance issues from garbage collection. Based on the fact that the earlier version performed well, a too-small heap is more likely than a very large heap. If your index size is such that it can't be effectively cached by the amount of total RAM on the machine (minus the java heap assigned to Solr), that can cause performance problems. Your index size is likely to be several gigabytes, and might even reach double-digit gigabytes. Can you relate those numbers -- index size, java heap size, and total system RAM? If you can, it would also be a good idea to share your solrconfig.xml. Here's a wiki page that goes into more detail about possible performance issues. It doesn't mention the possible compression problem: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
On Fri, Sep 5, 2014 at 9:55 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Why do one big commit? You could do hard commits along the way but keep searcher open and not see the changes until the end. Alexandre, I don't think it's can happen in solr-user list, next search pickups the new searcher. Why not? Isn't that what the Solr example configuration doing at: https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_0/solr/example/solr/collection1/conf/solrconfig.xml#L386 ? Hard commit does not reopen the searcher. The soft commit does (further down), but that can be disabled to get the effect I am proposing. What am I missing? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
Alexandre: It Depends (tm) of course. It all hinges on the setting in autocommit, whether openSearcher is true or false. In the former case, you, well, open a new searcher. In the latter you don't. I agree, though, this is all tangential to the memory consumption issue since the RAM buffer will be flushed regardless of these settings. FWIW, Erick On Fri, Sep 5, 2014 at 7:11 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: On Fri, Sep 5, 2014 at 9:55 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Why do one big commit? You could do hard commits along the way but keep searcher open and not see the changes until the end. Alexandre, I don't think it's can happen in solr-user list, next search pickups the new searcher. Why not? Isn't that what the Solr example configuration doing at: https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_0/solr/example/solr/collection1/conf/solrconfig.xml#L386 ? Hard commit does not reopen the searcher. The soft commit does (further down), but that can be disabled to get the effect I am proposing. What am I missing? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
On 9/3/2014 8:14 PM, Li, Ryan wrote: I have a Solr server indexes 2500 documents (up to 50MB each, ave 3MB) to Solr server. When running on Solr 4.0 I managed to finish index in 3 hours. However after we upgrade to Solr 4.9, the index need 3 days to finish. I've done some profiling, numbers I get are: size figure of document,time for adding to Solr server (4.0), time for adding to Solr server (4.9) 1.18, 6 sec, 123 sec 2.26 12sec 444 sec 3.35 18sec over 600 sec 9.6546sec timeout. From what I can see index seems has an o(n) performance for Solr 4.0 and is almost o(log n) for Solr 4.9. I also tried to comment out some copied fields to narrow down the problem, seems size of the document after index(we copy fields and the more fields we copy, the bigger the index size is) is the dominating factor for index time. Just wondering has any one experience similar problem? Does that sound like a bug of Solr or just we have use Solr 4.9 wrong? One possible source of problems with that particular upgrade is the fact that stored field compression was added in 4.1, and termvector compression was added in 4.2. They are on by default and cannot be turned off. The compression is typically fast, but with very large documents like yours, it might result in pretty major computational overhead. It can also require additional java heap, which ties into what follows: Another problem might be RAM-related. If your java heap is very large, or just a little bit too small, there can be major performance issues from garbage collection. Based on the fact that the earlier version performed well, a too-small heap is more likely than a very large heap. If your index size is such that it can't be effectively cached by the amount of total RAM on the machine (minus the java heap assigned to Solr), that can cause performance problems. Your index size is likely to be several gigabytes, and might even reach double-digit gigabytes. Can you relate those numbers -- index size, java heap size, and total system RAM? If you can, it would also be a good idea to share your solrconfig.xml. Here's a wiki page that goes into more detail about possible performance issues. It doesn't mention the possible compression problem: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
Ryan: As it happens, there's a discssion on the dev list about this. If at all possible, could you try a brief experiment? Turn off all the storage, i.e. set stored=false on all fields. It's a lot to ask, but it'd help the discussion. Or join the discussion at https://issues.apache.org/jira/browse/LUCENE-5914. Best, Erick On Thu, Sep 4, 2014 at 1:08 AM, Shawn Heisey s...@elyograg.org wrote: On 9/3/2014 8:14 PM, Li, Ryan wrote: I have a Solr server indexes 2500 documents (up to 50MB each, ave 3MB) to Solr server. When running on Solr 4.0 I managed to finish index in 3 hours. However after we upgrade to Solr 4.9, the index need 3 days to finish. I've done some profiling, numbers I get are: size figure of document,time for adding to Solr server (4.0), time for adding to Solr server (4.9) 1.18, 6 sec, 123 sec 2.26 12sec 444 sec 3.35 18sec over 600 sec 9.6546sec timeout. From what I can see index seems has an o(n) performance for Solr 4.0 and is almost o(log n) for Solr 4.9. I also tried to comment out some copied fields to narrow down the problem, seems size of the document after index(we copy fields and the more fields we copy, the bigger the index size is) is the dominating factor for index time. Just wondering has any one experience similar problem? Does that sound like a bug of Solr or just we have use Solr 4.9 wrong? One possible source of problems with that particular upgrade is the fact that stored field compression was added in 4.1, and termvector compression was added in 4.2. They are on by default and cannot be turned off. The compression is typically fast, but with very large documents like yours, it might result in pretty major computational overhead. It can also require additional java heap, which ties into what follows: Another problem might be RAM-related. If your java heap is very large, or just a little bit too small, there can be major performance issues from garbage collection. Based on the fact that the earlier version performed well, a too-small heap is more likely than a very large heap. If your index size is such that it can't be effectively cached by the amount of total RAM on the machine (minus the java heap assigned to Solr), that can cause performance problems. Your index size is likely to be several gigabytes, and might even reach double-digit gigabytes. Can you relate those numbers -- index size, java heap size, and total system RAM? If you can, it would also be a good idea to share your solrconfig.xml. Here's a wiki page that goes into more detail about possible performance issues. It doesn't mention the possible compression problem: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Solr add document over 20 times slower after upgrade from 4.0 to 4.9
I have a Solr server indexes 2500 documents (up to 50MB each, ave 3MB) to Solr server. When running on Solr 4.0 I managed to finish index in 3 hours. However after we upgrade to Solr 4.9, the index need 3 days to finish. I've done some profiling, numbers I get are: size figure of document,time for adding to Solr server (4.0), time for adding to Solr server (4.9) 1.18, 6 sec, 123 sec 2.26 12sec 444 sec 3.35 18sec over 600 sec 9.6546sec timeout. From what I can see index seems has an o(n) performance for Solr 4.0 and is almost o(log n) for Solr 4.9. I also tried to comment out some copied fields to narrow down the problem, seems size of the document after index(we copy fields and the more fields we copy, the bigger the index size is) is the dominating factor for index time. Just wondering has any one experience similar problem? Does that sound like a bug of Solr or just we have use Solr 4.9 wrong? Here is one example of field definition in my schema file. fieldType name=text_stem class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ charFilter class=solr.PatternReplaceCharFilterFactory pattern='+ replacement= / !-- strip off all apostrophe (') characters -- tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=../../resources/type-index-synonyms.txt/ filter class=solr.SnowballPorterFilterFactory language=English / !-- Used to have language=English - seems this param is gone in 4.9 -- filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ charFilter class=solr.PatternReplaceCharFilterFactory pattern='+ replacement= / !-- strip off all apostrophe (') characters -- tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=../../resources/type-query-colloq-synonyms.txt/ filter class=solr.SnowballPorterFilterFactory language=English / !-- Used to have language=English - seems this param is gone in 4.9 -- filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Field: field name=majorTextSignalStem type=text_stem indexed=true stored=false multiValued=true omitNorms=false/ Copy: copyField dest=majorTextSignalStem source=majorTextSignalRaw / Thanks, Ryan