Pagination issue when grouping

2017-05-29 Thread Nguyen Manh Tien
Hello,

I group search result by a field (with high cardinality)
I paginate search page using num of groups using param group.ngroups=true.
But that cause high CPU issue. So i turn off it.

Without ngroups=true, i can't get the num of groups so pagination is not
correct because i must use numFound,

it alway miss some last pages, the reason is some results was already
collapsed into groups in previous pages.

For example, a search return 11 results, but there are 2 results belong to
1 groups, so it has 10 groups (but i don't know it in advance because i set
ngroups=false), with 11 results, pagination display 2 pages, but page 2
have 0 results.

Anyone faced similar issue and had a work around?

Thanks,
Tien


Re: High CPU when use grouping group.ngroups=true

2017-05-24 Thread Nguyen Manh Tien
Without using ngroups=true, is there any way to handle pagination correctly
when we collapse result using grouping?

Regards,
Tien

On Tue, May 23, 2017 at 9:55 PM, Nguyen Manh Tien  wrote:

> The collapse field is high-cardinality field. I haven't profiling yet but
> will do it.
>
> Thanks,
> Tien
>
> On Tue, May 23, 2017 at 9:48 PM, Erick Erickson 
> wrote:
>
>> How many unique values in your group field? For high-cardinality
>> fields there's quite a bit of bookkeeping that needs to be done.
>>
>> Have you tried profiling to see where the CPU time is being spent?
>>
>> Best,
>> Erick
>>
>> On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien
>>  wrote:
>> > Hi All,
>> >
>> > I recently switch from solr field collapse/expand to grouping for
>> collapse
>> > search result
>> > All seem good but CPU is always high (80-100%) when i set param
>> > group.ngroups=true.
>> >
>> > We set ngroups=true to get number of groups so that we can paginate
>> search
>> > result correctly.
>> > Due to CPU issue we need to turn it off.
>> >
>> > Is ngroups=true is expensive feature? Is there any way to prevent CPU
>> issue
>> > and still have correct pagination.
>> >
>> > Thanks,
>> > Tien
>>
>
>


Re: High CPU when use grouping group.ngroups=true

2017-05-23 Thread Nguyen Manh Tien
The collapse field is high-cardinality field. I haven't profiling yet but
will do it.

Thanks,
Tien

On Tue, May 23, 2017 at 9:48 PM, Erick Erickson 
wrote:

> How many unique values in your group field? For high-cardinality
> fields there's quite a bit of bookkeeping that needs to be done.
>
> Have you tried profiling to see where the CPU time is being spent?
>
> Best,
> Erick
>
> On Tue, May 23, 2017 at 7:46 AM, Nguyen Manh Tien
>  wrote:
> > Hi All,
> >
> > I recently switch from solr field collapse/expand to grouping for
> collapse
> > search result
> > All seem good but CPU is always high (80-100%) when i set param
> > group.ngroups=true.
> >
> > We set ngroups=true to get number of groups so that we can paginate
> search
> > result correctly.
> > Due to CPU issue we need to turn it off.
> >
> > Is ngroups=true is expensive feature? Is there any way to prevent CPU
> issue
> > and still have correct pagination.
> >
> > Thanks,
> > Tien
>


High CPU when use grouping group.ngroups=true

2017-05-23 Thread Nguyen Manh Tien
Hi All,

I recently switch from solr field collapse/expand to grouping for collapse
search result
All seem good but CPU is always high (80-100%) when i set param
group.ngroups=true.

We set ngroups=true to get number of groups so that we can paginate search
result correctly.
Due to CPU issue we need to turn it off.

Is ngroups=true is expensive feature? Is there any way to prevent CPU issue
and still have correct pagination.

Thanks,
Tien


Re: Explicit OR in edismax query with mm=100%

2017-05-12 Thread Nguyen Manh Tien
Hi,

In our case, mm=100% is fixed. it works well for many other query.
I just need an option in edismax so that for query "Solr OR Lucene" with
explicit OR, mm will be ignore.

Thanks,
Tien

On Thu, Apr 20, 2017 at 9:56 AM, Yasufumi Mizoguchi 
wrote:

> Hi,
>
> It looks that edismax respects the mm parameter in your case.
> You should set "mm=1", if you want to obtain the results of OR search.
> "mm=100%" means that all terms in your query should match.
>
> Regards,
> Yasufumi
>
>
>
> On 2017/04/20 10:40, Nguyen Manh Tien wrote:
>
>> Hi,
>>
>> I run a query "Solr OR Lucene" with defType=edismax and mm=100%.
>> The search result show that query works similar to "Solr AND Lucene" (all
>> terms required)
>>
>> Does edismax ignore mm parameter because i already use OR explicitly here?
>>
>> Thanks,
>> Tien
>>
>>
>


Explicit OR in edismax query with mm=100%

2017-04-19 Thread Nguyen Manh Tien
Hi,

I run a query "Solr OR Lucene" with defType=edismax and mm=100%.
The search result show that query works similar to "Solr AND Lucene" (all
terms required)

Does edismax ignore mm parameter because i already use OR explicitly here?

Thanks,
Tien


Re: Increasing number of SolrIndexSearcher (Leakage)?

2014-02-18 Thread Nguyen Manh Tien
I found a custom component cause that issue,
It creates a SolrQueryRequest but doesn't close at the end that make ref to
SolrIndexSearcher don't go to 0 and SIS is not released.

>
>

On Tue, Feb 18, 2014 at 9:31 PM, Yonik Seeley  wrote:

> On Mon, Feb 17, 2014 at 1:34 AM, Nguyen Manh Tien
>  wrote:
> > - *But after i index some docs and run softCommit or hardCommit with
> > openSearcher=false, number of SolrIndexSearcher increase by 1*
>
> This is fine... it's more of an internal implementation detail (we
> open what is called a "real-time" searcher so we can drop some other
> data structures like the list of non-visible document updates, etc).
> If you did the commit again, the count should not continue to
> increase.
>
> If the number of searchers continues to increase, you have a searcher
> leak due to something else.
> Are you using any custom components or anything else that isn't stock Solr?
>
> -Yonik
> http://heliosearch.org - native off-heap filters and fieldcache for solr
>


Re: Solr index filename doesn't match with solr vesion

2014-02-17 Thread Nguyen Manh Tien
Thanks Shawn, Tri for your infos, explanation.
Tien


On Mon, Feb 17, 2014 at 1:36 PM, Tri Cao  wrote:

> Lucene main file formats actually don't change a lot in 4.x (or even 5.x),
> and the newer codecs just delegate to previous versions for most file
> types. The newer file types don't typically include Lucene's version in
> file names.
>
> For example, Lucene 4.6 codes basically delegate stored fields and term
> vector file format to 4.1, doc format to 4.0, etc. and only implement the
> new segment info/fields info formats (the .si and .fnm files).
>
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_6/lucene/core/src/java/org/apache/lucene/codecs/lucene46/Lucene46Codec.java#L50
>
> Hope this helps,
> Tri
>
>
> On Feb 16, 2014, at 08:52 PM, Shawn Heisey  wrote:
>
> On 2/16/2014 7:25 PM, Nguyen Manh Tien wrote:
>
> I upgraded recently from solr 4.0 to solr 4.6,
>
> I check solr index folder and found there file
>
> _aars_*Lucene41*_0.doc
>
> _aars_*Lucene41*_0.pos
>
> _aars_*Lucene41*_0.tim
>
> _aars_*Lucene41*_0.tip
>
> I don't know why it don't have *Lucene46* in file name.
>
>
> This is an indication that this part of the index is using a file format
> introduced in Lucene 4.1.
>
> Here's what I have for one of my index segments on a Solr 4.6.1 server:
>
> _5s7_2h.del
> _5s7.fdt
> _5s7.fdx
> _5s7.fnm
> _5s7_Lucene41_0.doc
> _5s7_Lucene41_0.pos
> _5s7_Lucene41_0.tim
> _5s7_Lucene41_0.tip
> _5s7_Lucene45_0.dvd
> _5s7_Lucene45_0.dvm
> _5s7.nvd
> _5s7.nvm
> _5s7.si
> _5s7.tvd
> _5s7.tvx
>
> It shows the same pieces as your list, but I am also using docValues in
> my index, and those files indicate that they are using the format from
> Lucene 4.5. I'm not sure why there are not version numbers in *all* of
> the file extensions -- that happens in the Lucene layer, which is a bit
> of a mystery to me.
>
> Thanks,
> Shawn
>
>


Increasing number of SolrIndexSearcher (Leakage)?

2014-02-16 Thread Nguyen Manh Tien
Hello,

My solr got OOM recently after i upgraded from solr 4.0 to 4.6.1.
I check heap dump and found that it has many SolrIndexSearcher (SIS)
objects (24), i expect only 1 SIS because we have 1 core.

I make some experiment
- Right after start solr, it has only 1 SolrIndexSearcher
- *But after i index some docs and run softCommit or hardCommit with
openSearcher=false, number of SolrIndexSearcher increase by 1*
- When hard commit with openSearcher=true, nubmer of SolrIndexSearcher
(SIS) doesn't increase but i foudn it log, it open new searcher, i guest
old SIS closed.

I don't know why number of SIS increase like this and finally cause
OutOfMemory, can SolrIndexSearcher be leak?

Regards,
Tien


Solr index filename doesn't match with solr vesion

2014-02-16 Thread Nguyen Manh Tien
Hello,

I upgraded recently from solr 4.0 to solr 4.6,
I check solr index folder and found there file

_aars_*Lucene41*_0.doc
_aars_*Lucene41*_0.pos
_aars_*Lucene41*_0.tim
_aars_*Lucene41*_0.tip

I don't know why it don't have *Lucene46* in file name.

Is there something wrong?

Thanks,
Tien