Re: Facet shows deleted values...
bq: And I also read somewhere that explicit commit is not recommended in SolrCloud mode Not quite, it's just easy to have too many commits happen too frequently from multiple indexing clients. It's also rare that the benefits of the clients issuing commits outweighs the chance of getting it wrong. It's not so much it's not recommended as usually not at all necessary and easy to get wrong. Best, Erick On Mon, Jan 4, 2016 at 5:15 PM, Shawn Heisey wrote: > On 1/4/2016 4:11 PM, Don Bosco Durai wrote: >> Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. >> And I also read somewhere that explicit commit is not recommended in >> SolrCloud mode. Regarding auto warm, my server has/was been running for a >> while. > > Since 4.0, autoCommit with openSearcher set to false is highly > recommended, no matter what your needs are regarding visibility, and > whether or not you're running in cloud mode. The exact interval to use > is a subject for vigorous debate. A common maxTime value that you will > see for autoCommit is 15 seconds (15000). I personally feel this is too > frequent, but many people use that value with no problems. I use five > minutes (30) in my own config, but over the course of those five > minutes, there's not much in the way of updates, so the log replay will > take very little time. Using autoCommit with openSearcher set to false > takes care of transaction log rotation, it doesn't do ANYTHING for > document visibility. > > The issue of how to handle document visibility will depend on exactly > how you use your index. Do not worry about whether the index is > SolrCloud or not for this topic. > > One way of handling document visibility is to use autoSoftCommit > (available since 4.0) in your config ... with maxTime set to the longest > possible interval you can stand. My personal recommendation is to never > set that interval shorter than one minute (6). Push back if you are > told that documents must be visible faster than that. If you use > autoSoftCommit, you won't need explicit commits from your indexing > application. > > Another way to handle document visibility is the commitWithin parameter > on each update request. This is similar to autoSoftCommit, but gets set > on the update request. Just like autoSoftCommit, I would not recommend > a value less than one minute, and if this parameter is used on all > updates, you will never need an explicit commit. > > Using autoSoftCommit or commitWithin is a good option if there are many > clients/threads sending changes to the same index or the indexing > happens in bursts where the update size is wildly different and > completely unpredictable. > > The final way to handle document visibility is explicit commits. When > you want changes to be visible, you send a commit, hard or soft, with > openSearcher set to true (this is the default for this parameter), and a > short time later, all changes sent before that commit will become > visible. This is how I handle my own index. This is a good option if > all indexing is coming from a single source and that source has complete > control over all indexing operations. > > One of the strong goals with commits is to avoid them happening too > frequently, so they don't overlap, and so the machine is spending less > time handling commits than it spends either idle or handling queries. > > Here's a blog post with more detail. The blog post says "SolrCloud" but > almost all of it is equally applicable to Solr 4.x and 5.x indexes that > are not running in cloud mode: > > http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > Thanks, > Shawn >
Re: Facet shows deleted values...
On 1/4/2016 4:11 PM, Don Bosco Durai wrote: > Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. > And I also read somewhere that explicit commit is not recommended in > SolrCloud mode. Regarding auto warm, my server has/was been running for a > while. Since 4.0, autoCommit with openSearcher set to false is highly recommended, no matter what your needs are regarding visibility, and whether or not you're running in cloud mode. The exact interval to use is a subject for vigorous debate. A common maxTime value that you will see for autoCommit is 15 seconds (15000). I personally feel this is too frequent, but many people use that value with no problems. I use five minutes (30) in my own config, but over the course of those five minutes, there's not much in the way of updates, so the log replay will take very little time. Using autoCommit with openSearcher set to false takes care of transaction log rotation, it doesn't do ANYTHING for document visibility. The issue of how to handle document visibility will depend on exactly how you use your index. Do not worry about whether the index is SolrCloud or not for this topic. One way of handling document visibility is to use autoSoftCommit (available since 4.0) in your config ... with maxTime set to the longest possible interval you can stand. My personal recommendation is to never set that interval shorter than one minute (6). Push back if you are told that documents must be visible faster than that. If you use autoSoftCommit, you won't need explicit commits from your indexing application. Another way to handle document visibility is the commitWithin parameter on each update request. This is similar to autoSoftCommit, but gets set on the update request. Just like autoSoftCommit, I would not recommend a value less than one minute, and if this parameter is used on all updates, you will never need an explicit commit. Using autoSoftCommit or commitWithin is a good option if there are many clients/threads sending changes to the same index or the indexing happens in bursts where the update size is wildly different and completely unpredictable. The final way to handle document visibility is explicit commits. When you want changes to be visible, you send a commit, hard or soft, with openSearcher set to true (this is the default for this parameter), and a short time later, all changes sent before that commit will become visible. This is how I handle my own index. This is a good option if all indexing is coming from a single source and that source has complete control over all indexing operations. One of the strong goals with commits is to avoid them happening too frequently, so they don't overlap, and so the machine is spending less time handling commits than it spends either idle or handling queries. Here's a blog post with more detail. The blog post says "SolrCloud" but almost all of it is equally applicable to Solr 4.x and 5.x indexes that are not running in cloud mode: http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Thanks, Shawn
Re: Facet shows deleted values...
Tomás, thanks for the suggestion. facet.mincount will solve my issue. Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. And I also read somewhere that explicit commit is not recommended in SolrCloud mode. Regarding auto warm, my server has/was been running for a while. Lost my env during the holidays. I will rebuild it and monitor this further. I will also try to explicit commit() to see if that helps. Thanks Bosco On 12/29/15, 5:48 PM, "Tomás Fernández Löbbe" wrote: >I believe the problem here is that terms from the deleted docs still appear >in the facets, even with a doc count of 0, is that it? Can you use >facet.mincount=1 or would that not be a good fit for your use case? > >https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.mincountParameter > >Tomás > >On Tue, Dec 29, 2015 at 5:23 PM, Erick Erickson >wrote: > >> Let's be sure we're using terms similarly >> >> That article is from 2010, so is unreliable in the 5.2 world, I'd ignore >> that. >> >> First, facets should always reflect the latest commit, regardless of >> expungeDeletes or optimizes/forcemerges. >> >> _commits_ are definitely recommended. Optimize/forcemerge (or >> expungedeletes) are rarely necessary and >> should _not_ be necessary for facets to not count omitted documents. >> >> Is it possible that your autowarm period is long and you're still >> getting an old searcher when you run your tests? >> >> Assuming that you commit(), then wait a few minutes, do you see >> inaccurate facets? If so, what are the >> exact steps you follow? >> >> Best, >> Erick >> >> On Tue, Dec 29, 2015 at 12:54 PM, Don Bosco Durai >> wrote: >> > I am purging some of my data on regular basis, but when I run a facet >> query, the deleted values are still shown in the facet list. >> > >> > Seems, commit with expunge resolves this issue ( >> http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields >> ). But it seems, commit is no more recommended. Also, I am running Solr 5.2 >> in SolrCloud mode. >> > >> > What is the recommendation here? >> > >> > Thanks >> > >> > Bosco >> > >> > >>
Re: Facet shows deleted values...
I believe the problem here is that terms from the deleted docs still appear in the facets, even with a doc count of 0, is that it? Can you use facet.mincount=1 or would that not be a good fit for your use case? https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.mincountParameter Tomás On Tue, Dec 29, 2015 at 5:23 PM, Erick Erickson wrote: > Let's be sure we're using terms similarly > > That article is from 2010, so is unreliable in the 5.2 world, I'd ignore > that. > > First, facets should always reflect the latest commit, regardless of > expungeDeletes or optimizes/forcemerges. > > _commits_ are definitely recommended. Optimize/forcemerge (or > expungedeletes) are rarely necessary and > should _not_ be necessary for facets to not count omitted documents. > > Is it possible that your autowarm period is long and you're still > getting an old searcher when you run your tests? > > Assuming that you commit(), then wait a few minutes, do you see > inaccurate facets? If so, what are the > exact steps you follow? > > Best, > Erick > > On Tue, Dec 29, 2015 at 12:54 PM, Don Bosco Durai > wrote: > > I am purging some of my data on regular basis, but when I run a facet > query, the deleted values are still shown in the facet list. > > > > Seems, commit with expunge resolves this issue ( > http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields > ). But it seems, commit is no more recommended. Also, I am running Solr 5.2 > in SolrCloud mode. > > > > What is the recommendation here? > > > > Thanks > > > > Bosco > > > > >
Re: Facet shows deleted values...
Let's be sure we're using terms similarly That article is from 2010, so is unreliable in the 5.2 world, I'd ignore that. First, facets should always reflect the latest commit, regardless of expungeDeletes or optimizes/forcemerges. _commits_ are definitely recommended. Optimize/forcemerge (or expungedeletes) are rarely necessary and should _not_ be necessary for facets to not count omitted documents. Is it possible that your autowarm period is long and you're still getting an old searcher when you run your tests? Assuming that you commit(), then wait a few minutes, do you see inaccurate facets? If so, what are the exact steps you follow? Best, Erick On Tue, Dec 29, 2015 at 12:54 PM, Don Bosco Durai wrote: > I am purging some of my data on regular basis, but when I run a facet query, > the deleted values are still shown in the facet list. > > Seems, commit with expunge resolves this issue > (http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields > ). But it seems, commit is no more recommended. Also, I am running Solr 5.2 > in SolrCloud mode. > > What is the recommendation here? > > Thanks > > Bosco > >