Pagination issue when grouping
Hello, I group search result by a field (with high cardinality) I paginate search page using num of groups using param group.ngroups=true. But that cause high CPU issue. So i turn off it. Without ngroups=true, i can't get the num of groups so pagination is not correct because i must use numFound, it alway miss some last pages, the reason is some results was already collapsed into groups in previous pages. For example, a search return 11 results, but there are 2 results belong to 1 groups, so it has 10 groups (but i don't know it in advance because i set ngroups=false), with 11 results, pagination display 2 pages, but page 2 have 0 results. Anyone faced similar issue and had a work around? Thanks, Tien
Re: High CPU when use grouping group.ngroups=true
Without using ngroups=true, is there any way to handle pagination correctly when we collapse result using grouping? Regards, Tien On Tue, May 23, 2017 at 9:55 PM, Nguyen Manh Tien <tien.nguyenm...@gmail.com > wrote: > The collapse field is high-cardinality field. I haven't profiling yet but > will do it. > > Thanks, > Tien > > On Tue, May 23, 2017 at 9:48 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> How many unique values in your group field? For high-cardinality >> fields there's quite a bit of bookkeeping that needs to be done. >> >> Have you tried profiling to see where the CPU time is being spent? >> >> Best, >> Erick >> >> On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien >> <tien.nguyenm...@gmail.com> wrote: >> > Hi All, >> > >> > I recently switch from solr field collapse/expand to grouping for >> collapse >> > search result >> > All seem good but CPU is always high (80-100%) when i set param >> > group.ngroups=true. >> > >> > We set ngroups=true to get number of groups so that we can paginate >> search >> > result correctly. >> > Due to CPU issue we need to turn it off. >> > >> > Is ngroups=true is expensive feature? Is there any way to prevent CPU >> issue >> > and still have correct pagination. >> > >> > Thanks, >> > Tien >> > >
Re: High CPU when use grouping group.ngroups=true
The collapse field is high-cardinality field. I haven't profiling yet but will do it. Thanks, Tien On Tue, May 23, 2017 at 9:48 PM, Erick Erickson <erickerick...@gmail.com> wrote: > How many unique values in your group field? For high-cardinality > fields there's quite a bit of bookkeeping that needs to be done. > > Have you tried profiling to see where the CPU time is being spent? > > Best, > Erick > > On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien > <tien.nguyenm...@gmail.com> wrote: > > Hi All, > > > > I recently switch from solr field collapse/expand to grouping for > collapse > > search result > > All seem good but CPU is always high (80-100%) when i set param > > group.ngroups=true. > > > > We set ngroups=true to get number of groups so that we can paginate > search > > result correctly. > > Due to CPU issue we need to turn it off. > > > > Is ngroups=true is expensive feature? Is there any way to prevent CPU > issue > > and still have correct pagination. > > > > Thanks, > > Tien >
High CPU when use grouping group.ngroups=true
Hi All, I recently switch from solr field collapse/expand to grouping for collapse search result All seem good but CPU is always high (80-100%) when i set param group.ngroups=true. We set ngroups=true to get number of groups so that we can paginate search result correctly. Due to CPU issue we need to turn it off. Is ngroups=true is expensive feature? Is there any way to prevent CPU issue and still have correct pagination. Thanks, Tien
Re: Explicit OR in edismax query with mm=100%
Hi, In our case, mm=100% is fixed. it works well for many other query. I just need an option in edismax so that for query "Solr OR Lucene" with explicit OR, mm will be ignore. Thanks, Tien On Thu, Apr 20, 2017 at 9:56 AM, Yasufumi Mizoguchi <yasufumi0...@gmail.com> wrote: > Hi, > > It looks that edismax respects the mm parameter in your case. > You should set "mm=1", if you want to obtain the results of OR search. > "mm=100%" means that all terms in your query should match. > > Regards, > Yasufumi > > > > On 2017/04/20 10:40, Nguyen Manh Tien wrote: > >> Hi, >> >> I run a query "Solr OR Lucene" with defType=edismax and mm=100%. >> The search result show that query works similar to "Solr AND Lucene" (all >> terms required) >> >> Does edismax ignore mm parameter because i already use OR explicitly here? >> >> Thanks, >> Tien >> >> >
Explicit OR in edismax query with mm=100%
Hi, I run a query "Solr OR Lucene" with defType=edismax and mm=100%. The search result show that query works similar to "Solr AND Lucene" (all terms required) Does edismax ignore mm parameter because i already use OR explicitly here? Thanks, Tien
Re: Increasing number of SolrIndexSearcher (Leakage)?
I found a custom component cause that issue, It creates a SolrQueryRequest but doesn't close at the end that make ref to SolrIndexSearcher don't go to 0 and SIS is not released. On Tue, Feb 18, 2014 at 9:31 PM, Yonik Seeley yo...@heliosearch.com wrote: On Mon, Feb 17, 2014 at 1:34 AM, Nguyen Manh Tien tien.nguyenm...@gmail.com wrote: - *But after i index some docs and run softCommit or hardCommit with openSearcher=false, number of SolrIndexSearcher increase by 1* This is fine... it's more of an internal implementation detail (we open what is called a real-time searcher so we can drop some other data structures like the list of non-visible document updates, etc). If you did the commit again, the count should not continue to increase. If the number of searchers continues to increase, you have a searcher leak due to something else. Are you using any custom components or anything else that isn't stock Solr? -Yonik http://heliosearch.org - native off-heap filters and fieldcache for solr
Re: Solr index filename doesn't match with solr vesion
Thanks Shawn, Tri for your infos, explanation. Tien On Mon, Feb 17, 2014 at 1:36 PM, Tri Cao tm...@me.com wrote: Lucene main file formats actually don't change a lot in 4.x (or even 5.x), and the newer codecs just delegate to previous versions for most file types. The newer file types don't typically include Lucene's version in file names. For example, Lucene 4.6 codes basically delegate stored fields and term vector file format to 4.1, doc format to 4.0, etc. and only implement the new segment info/fields info formats (the .si and .fnm files). https://github.com/apache/lucene-solr/blob/lucene_solr_4_6/lucene/core/src/java/org/apache/lucene/codecs/lucene46/Lucene46Codec.java#L50 Hope this helps, Tri On Feb 16, 2014, at 08:52 PM, Shawn Heisey s...@elyograg.org wrote: On 2/16/2014 7:25 PM, Nguyen Manh Tien wrote: I upgraded recently from solr 4.0 to solr 4.6, I check solr index folder and found there file _aars_*Lucene41*_0.doc _aars_*Lucene41*_0.pos _aars_*Lucene41*_0.tim _aars_*Lucene41*_0.tip I don't know why it don't have *Lucene46* in file name. This is an indication that this part of the index is using a file format introduced in Lucene 4.1. Here's what I have for one of my index segments on a Solr 4.6.1 server: _5s7_2h.del _5s7.fdt _5s7.fdx _5s7.fnm _5s7_Lucene41_0.doc _5s7_Lucene41_0.pos _5s7_Lucene41_0.tim _5s7_Lucene41_0.tip _5s7_Lucene45_0.dvd _5s7_Lucene45_0.dvm _5s7.nvd _5s7.nvm _5s7.si _5s7.tvd _5s7.tvx It shows the same pieces as your list, but I am also using docValues in my index, and those files indicate that they are using the format from Lucene 4.5. I'm not sure why there are not version numbers in *all* of the file extensions -- that happens in the Lucene layer, which is a bit of a mystery to me. Thanks, Shawn
Solr index filename doesn't match with solr vesion
Hello, I upgraded recently from solr 4.0 to solr 4.6, I check solr index folder and found there file _aars_*Lucene41*_0.doc _aars_*Lucene41*_0.pos _aars_*Lucene41*_0.tim _aars_*Lucene41*_0.tip I don't know why it don't have *Lucene46* in file name. Is there something wrong? Thanks, Tien
Increasing number of SolrIndexSearcher (Leakage)?
Hello, My solr got OOM recently after i upgraded from solr 4.0 to 4.6.1. I check heap dump and found that it has many SolrIndexSearcher (SIS) objects (24), i expect only 1 SIS because we have 1 core. I make some experiment - Right after start solr, it has only 1 SolrIndexSearcher - *But after i index some docs and run softCommit or hardCommit with openSearcher=false, number of SolrIndexSearcher increase by 1* - When hard commit with openSearcher=true, nubmer of SolrIndexSearcher (SIS) doesn't increase but i foudn it log, it open new searcher, i guest old SIS closed. I don't know why number of SIS increase like this and finally cause OutOfMemory, can SolrIndexSearcher be leak? Regards, Tien