Pagination issue when grouping
Hello, I group search result by a field (with high cardinality) I paginate search page using num of groups using param group.ngroups=true. But that cause high CPU issue. So i turn off it. Without ngroups=true, i can't get the num of groups so pagination is not correct because i must use numFound, it alway miss some last pages, the reason is some results was already collapsed into groups in previous pages. For example, a search return 11 results, but there are 2 results belong to 1 groups, so it has 10 groups (but i don't know it in advance because i set ngroups=false), with 11 results, pagination display 2 pages, but page 2 have 0 results. Anyone faced similar issue and had a work around? Thanks, Tien
Re: High CPU when use grouping group.ngroups=true
Without using ngroups=true, is there any way to handle pagination correctly when we collapse result using grouping? Regards, Tien On Tue, May 23, 2017 at 9:55 PM, Nguyen Manh Tien wrote: > The collapse field is high-cardinality field. I haven't profiling yet but > will do it. > > Thanks, > Tien > > On Tue, May 23, 2017 at 9:48 PM, Erick Erickson > wrote: > >> How many unique values in your group field? For high-cardinality >> fields there's quite a bit of bookkeeping that needs to be done. >> >> Have you tried profiling to see where the CPU time is being spent? >> >> Best, >> Erick >> >> On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien >> wrote: >> > Hi All, >> > >> > I recently switch from solr field collapse/expand to grouping for >> collapse >> > search result >> > All seem good but CPU is always high (80-100%) when i set param >> > group.ngroups=true. >> > >> > We set ngroups=true to get number of groups so that we can paginate >> search >> > result correctly. >> > Due to CPU issue we need to turn it off. >> > >> > Is ngroups=true is expensive feature? Is there any way to prevent CPU >> issue >> > and still have correct pagination. >> > >> > Thanks, >> > Tien >> > >
Re: High CPU when use grouping group.ngroups=true
The collapse field is high-cardinality field. I haven't profiling yet but will do it. Thanks, Tien On Tue, May 23, 2017 at 9:48 PM, Erick Erickson wrote: > How many unique values in your group field? For high-cardinality > fields there's quite a bit of bookkeeping that needs to be done. > > Have you tried profiling to see where the CPU time is being spent? > > Best, > Erick > > On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien > wrote: > > Hi All, > > > > I recently switch from solr field collapse/expand to grouping for > collapse > > search result > > All seem good but CPU is always high (80-100%) when i set param > > group.ngroups=true. > > > > We set ngroups=true to get number of groups so that we can paginate > search > > result correctly. > > Due to CPU issue we need to turn it off. > > > > Is ngroups=true is expensive feature? Is there any way to prevent CPU > issue > > and still have correct pagination. > > > > Thanks, > > Tien >
High CPU when use grouping group.ngroups=true
Hi All, I recently switch from solr field collapse/expand to grouping for collapse search result All seem good but CPU is always high (80-100%) when i set param group.ngroups=true. We set ngroups=true to get number of groups so that we can paginate search result correctly. Due to CPU issue we need to turn it off. Is ngroups=true is expensive feature? Is there any way to prevent CPU issue and still have correct pagination. Thanks, Tien
Re: Explicit OR in edismax query with mm=100%
Hi, In our case, mm=100% is fixed. it works well for many other query. I just need an option in edismax so that for query "Solr OR Lucene" with explicit OR, mm will be ignore. Thanks, Tien On Thu, Apr 20, 2017 at 9:56 AM, Yasufumi Mizoguchi wrote: > Hi, > > It looks that edismax respects the mm parameter in your case. > You should set "mm=1", if you want to obtain the results of OR search. > "mm=100%" means that all terms in your query should match. > > Regards, > Yasufumi > > > > On 2017/04/20 10:40, Nguyen Manh Tien wrote: > >> Hi, >> >> I run a query "Solr OR Lucene" with defType=edismax and mm=100%. >> The search result show that query works similar to "Solr AND Lucene" (all >> terms required) >> >> Does edismax ignore mm parameter because i already use OR explicitly here? >> >> Thanks, >> Tien >> >> >
Explicit OR in edismax query with mm=100%
Hi, I run a query "Solr OR Lucene" with defType=edismax and mm=100%. The search result show that query works similar to "Solr AND Lucene" (all terms required) Does edismax ignore mm parameter because i already use OR explicitly here? Thanks, Tien
Re: Increasing number of SolrIndexSearcher (Leakage)?
I found a custom component cause that issue, It creates a SolrQueryRequest but doesn't close at the end that make ref to SolrIndexSearcher don't go to 0 and SIS is not released. > > On Tue, Feb 18, 2014 at 9:31 PM, Yonik Seeley wrote: > On Mon, Feb 17, 2014 at 1:34 AM, Nguyen Manh Tien > wrote: > > - *But after i index some docs and run softCommit or hardCommit with > > openSearcher=false, number of SolrIndexSearcher increase by 1* > > This is fine... it's more of an internal implementation detail (we > open what is called a "real-time" searcher so we can drop some other > data structures like the list of non-visible document updates, etc). > If you did the commit again, the count should not continue to > increase. > > If the number of searchers continues to increase, you have a searcher > leak due to something else. > Are you using any custom components or anything else that isn't stock Solr? > > -Yonik > http://heliosearch.org - native off-heap filters and fieldcache for solr >
Re: Solr index filename doesn't match with solr vesion
Thanks Shawn, Tri for your infos, explanation. Tien On Mon, Feb 17, 2014 at 1:36 PM, Tri Cao wrote: > Lucene main file formats actually don't change a lot in 4.x (or even 5.x), > and the newer codecs just delegate to previous versions for most file > types. The newer file types don't typically include Lucene's version in > file names. > > For example, Lucene 4.6 codes basically delegate stored fields and term > vector file format to 4.1, doc format to 4.0, etc. and only implement the > new segment info/fields info formats (the .si and .fnm files). > > > https://github.com/apache/lucene-solr/blob/lucene_solr_4_6/lucene/core/src/java/org/apache/lucene/codecs/lucene46/Lucene46Codec.java#L50 > > Hope this helps, > Tri > > > On Feb 16, 2014, at 08:52 PM, Shawn Heisey wrote: > > On 2/16/2014 7:25 PM, Nguyen Manh Tien wrote: > > I upgraded recently from solr 4.0 to solr 4.6, > > I check solr index folder and found there file > > _aars_*Lucene41*_0.doc > > _aars_*Lucene41*_0.pos > > _aars_*Lucene41*_0.tim > > _aars_*Lucene41*_0.tip > > I don't know why it don't have *Lucene46* in file name. > > > This is an indication that this part of the index is using a file format > introduced in Lucene 4.1. > > Here's what I have for one of my index segments on a Solr 4.6.1 server: > > _5s7_2h.del > _5s7.fdt > _5s7.fdx > _5s7.fnm > _5s7_Lucene41_0.doc > _5s7_Lucene41_0.pos > _5s7_Lucene41_0.tim > _5s7_Lucene41_0.tip > _5s7_Lucene45_0.dvd > _5s7_Lucene45_0.dvm > _5s7.nvd > _5s7.nvm > _5s7.si > _5s7.tvd > _5s7.tvx > > It shows the same pieces as your list, but I am also using docValues in > my index, and those files indicate that they are using the format from > Lucene 4.5. I'm not sure why there are not version numbers in *all* of > the file extensions -- that happens in the Lucene layer, which is a bit > of a mystery to me. > > Thanks, > Shawn > >
Increasing number of SolrIndexSearcher (Leakage)?
Hello, My solr got OOM recently after i upgraded from solr 4.0 to 4.6.1. I check heap dump and found that it has many SolrIndexSearcher (SIS) objects (24), i expect only 1 SIS because we have 1 core. I make some experiment - Right after start solr, it has only 1 SolrIndexSearcher - *But after i index some docs and run softCommit or hardCommit with openSearcher=false, number of SolrIndexSearcher increase by 1* - When hard commit with openSearcher=true, nubmer of SolrIndexSearcher (SIS) doesn't increase but i foudn it log, it open new searcher, i guest old SIS closed. I don't know why number of SIS increase like this and finally cause OutOfMemory, can SolrIndexSearcher be leak? Regards, Tien
Solr index filename doesn't match with solr vesion
Hello, I upgraded recently from solr 4.0 to solr 4.6, I check solr index folder and found there file _aars_*Lucene41*_0.doc _aars_*Lucene41*_0.pos _aars_*Lucene41*_0.tim _aars_*Lucene41*_0.tip I don't know why it don't have *Lucene46* in file name. Is there something wrong? Thanks, Tien