Pagination issue when grouping

2017-05-29 Thread Nguyen Manh Tien
Hello,

I group search result by a field (with high cardinality)
I paginate search page using num of groups using param group.ngroups=true.
But that cause high CPU issue. So i turn off it.

Without ngroups=true, i can't get the num of groups so pagination is not
correct because i must use numFound,

it alway miss some last pages, the reason is some results was already
collapsed into groups in previous pages.

For example, a search return 11 results, but there are 2 results belong to
1 groups, so it has 10 groups (but i don't know it in advance because i set
ngroups=false), with 11 results, pagination display 2 pages, but page 2
have 0 results.

Anyone faced similar issue and had a work around?

Thanks,
Tien


Re: High CPU when use grouping group.ngroups=true

2017-05-24 Thread Nguyen Manh Tien
Without using ngroups=true, is there any way to handle pagination correctly
when we collapse result using grouping?

Regards,
Tien

On Tue, May 23, 2017 at 9:55 PM, Nguyen Manh Tien <tien.nguyenm...@gmail.com
> wrote:

> The collapse field is high-cardinality field. I haven't profiling yet but
> will do it.
>
> Thanks,
> Tien
>
> On Tue, May 23, 2017 at 9:48 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> How many unique values in your group field? For high-cardinality
>> fields there's quite a bit of bookkeeping that needs to be done.
>>
>> Have you tried profiling to see where the CPU time is being spent?
>>
>> Best,
>> Erick
>>
>> On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien
>> <tien.nguyenm...@gmail.com> wrote:
>> > Hi All,
>> >
>> > I recently switch from solr field collapse/expand to grouping for
>> collapse
>> > search result
>> > All seem good but CPU is always high (80-100%) when i set param
>> > group.ngroups=true.
>> >
>> > We set ngroups=true to get number of groups so that we can paginate
>> search
>> > result correctly.
>> > Due to CPU issue we need to turn it off.
>> >
>> > Is ngroups=true is expensive feature? Is there any way to prevent CPU
>> issue
>> > and still have correct pagination.
>> >
>> > Thanks,
>> > Tien
>>
>
>


Re: High CPU when use grouping group.ngroups=true

2017-05-23 Thread Nguyen Manh Tien
The collapse field is high-cardinality field. I haven't profiling yet but
will do it.

Thanks,
Tien

On Tue, May 23, 2017 at 9:48 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> How many unique values in your group field? For high-cardinality
> fields there's quite a bit of bookkeeping that needs to be done.
>
> Have you tried profiling to see where the CPU time is being spent?
>
> Best,
> Erick
>
> On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien
> <tien.nguyenm...@gmail.com> wrote:
> > Hi All,
> >
> > I recently switch from solr field collapse/expand to grouping for
> collapse
> > search result
> > All seem good but CPU is always high (80-100%) when i set param
> > group.ngroups=true.
> >
> > We set ngroups=true to get number of groups so that we can paginate
> search
> > result correctly.
> > Due to CPU issue we need to turn it off.
> >
> > Is ngroups=true is expensive feature? Is there any way to prevent CPU
> issue
> > and still have correct pagination.
> >
> > Thanks,
> > Tien
>


High CPU when use grouping group.ngroups=true

2017-05-23 Thread Nguyen Manh Tien
Hi All,

I recently switch from solr field collapse/expand to grouping for collapse
search result
All seem good but CPU is always high (80-100%) when i set param
group.ngroups=true.

We set ngroups=true to get number of groups so that we can paginate search
result correctly.
Due to CPU issue we need to turn it off.

Is ngroups=true is expensive feature? Is there any way to prevent CPU issue
and still have correct pagination.

Thanks,
Tien


Re: Explicit OR in edismax query with mm=100%

2017-05-12 Thread Nguyen Manh Tien
Hi,

In our case, mm=100% is fixed. it works well for many other query.
I just need an option in edismax so that for query "Solr OR Lucene" with
explicit OR, mm will be ignore.

Thanks,
Tien

On Thu, Apr 20, 2017 at 9:56 AM, Yasufumi Mizoguchi <yasufumi0...@gmail.com>
wrote:

> Hi,
>
> It looks that edismax respects the mm parameter in your case.
> You should set "mm=1", if you want to obtain the results of OR search.
> "mm=100%" means that all terms in your query should match.
>
> Regards,
> Yasufumi
>
>
>
> On 2017/04/20 10:40, Nguyen Manh Tien wrote:
>
>> Hi,
>>
>> I run a query "Solr OR Lucene" with defType=edismax and mm=100%.
>> The search result show that query works similar to "Solr AND Lucene" (all
>> terms required)
>>
>> Does edismax ignore mm parameter because i already use OR explicitly here?
>>
>> Thanks,
>> Tien
>>
>>
>


Explicit OR in edismax query with mm=100%

2017-04-19 Thread Nguyen Manh Tien
Hi,

I run a query "Solr OR Lucene" with defType=edismax and mm=100%.
The search result show that query works similar to "Solr AND Lucene" (all
terms required)

Does edismax ignore mm parameter because i already use OR explicitly here?

Thanks,
Tien


Re: Increasing number of SolrIndexSearcher (Leakage)?

2014-02-18 Thread Nguyen Manh Tien
I found a custom component cause that issue,
It creates a SolrQueryRequest but doesn't close at the end that make ref to
SolrIndexSearcher don't go to 0 and SIS is not released.




On Tue, Feb 18, 2014 at 9:31 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Mon, Feb 17, 2014 at 1:34 AM, Nguyen Manh Tien
 tien.nguyenm...@gmail.com wrote:
  - *But after i index some docs and run softCommit or hardCommit with
  openSearcher=false, number of SolrIndexSearcher increase by 1*

 This is fine... it's more of an internal implementation detail (we
 open what is called a real-time searcher so we can drop some other
 data structures like the list of non-visible document updates, etc).
 If you did the commit again, the count should not continue to
 increase.

 If the number of searchers continues to increase, you have a searcher
 leak due to something else.
 Are you using any custom components or anything else that isn't stock Solr?

 -Yonik
 http://heliosearch.org - native off-heap filters and fieldcache for solr



Re: Solr index filename doesn't match with solr vesion

2014-02-17 Thread Nguyen Manh Tien
Thanks Shawn, Tri for your infos, explanation.
Tien


On Mon, Feb 17, 2014 at 1:36 PM, Tri Cao tm...@me.com wrote:

 Lucene main file formats actually don't change a lot in 4.x (or even 5.x),
 and the newer codecs just delegate to previous versions for most file
 types. The newer file types don't typically include Lucene's version in
 file names.

 For example, Lucene 4.6 codes basically delegate stored fields and term
 vector file format to 4.1, doc format to 4.0, etc. and only implement the
 new segment info/fields info formats (the .si and .fnm files).


 https://github.com/apache/lucene-solr/blob/lucene_solr_4_6/lucene/core/src/java/org/apache/lucene/codecs/lucene46/Lucene46Codec.java#L50

 Hope this helps,
 Tri


 On Feb 16, 2014, at 08:52 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/16/2014 7:25 PM, Nguyen Manh Tien wrote:

 I upgraded recently from solr 4.0 to solr 4.6,

 I check solr index folder and found there file

 _aars_*Lucene41*_0.doc

 _aars_*Lucene41*_0.pos

 _aars_*Lucene41*_0.tim

 _aars_*Lucene41*_0.tip

 I don't know why it don't have *Lucene46* in file name.


 This is an indication that this part of the index is using a file format
 introduced in Lucene 4.1.

 Here's what I have for one of my index segments on a Solr 4.6.1 server:

 _5s7_2h.del
 _5s7.fdt
 _5s7.fdx
 _5s7.fnm
 _5s7_Lucene41_0.doc
 _5s7_Lucene41_0.pos
 _5s7_Lucene41_0.tim
 _5s7_Lucene41_0.tip
 _5s7_Lucene45_0.dvd
 _5s7_Lucene45_0.dvm
 _5s7.nvd
 _5s7.nvm
 _5s7.si
 _5s7.tvd
 _5s7.tvx

 It shows the same pieces as your list, but I am also using docValues in
 my index, and those files indicate that they are using the format from
 Lucene 4.5. I'm not sure why there are not version numbers in *all* of
 the file extensions -- that happens in the Lucene layer, which is a bit
 of a mystery to me.

 Thanks,
 Shawn




Solr index filename doesn't match with solr vesion

2014-02-16 Thread Nguyen Manh Tien
Hello,

I upgraded recently from solr 4.0 to solr 4.6,
I check solr index folder and found there file

_aars_*Lucene41*_0.doc
_aars_*Lucene41*_0.pos
_aars_*Lucene41*_0.tim
_aars_*Lucene41*_0.tip

I don't know why it don't have *Lucene46* in file name.

Is there something wrong?

Thanks,
Tien


Increasing number of SolrIndexSearcher (Leakage)?

2014-02-16 Thread Nguyen Manh Tien
Hello,

My solr got OOM recently after i upgraded from solr 4.0 to 4.6.1.
I check heap dump and found that it has many SolrIndexSearcher (SIS)
objects (24), i expect only 1 SIS because we have 1 core.

I make some experiment
- Right after start solr, it has only 1 SolrIndexSearcher
- *But after i index some docs and run softCommit or hardCommit with
openSearcher=false, number of SolrIndexSearcher increase by 1*
- When hard commit with openSearcher=true, nubmer of SolrIndexSearcher
(SIS) doesn't increase but i foudn it log, it open new searcher, i guest
old SIS closed.

I don't know why number of SIS increase like this and finally cause
OutOfMemory, can SolrIndexSearcher be leak?

Regards,
Tien