Re: Deduplication of search result with custom with custom sort

2020-10-13 Thread Dmitry Emets
equent and > you are not strict on 1000, you might retrieve more let's say 2000 without > grouping and then do the deduping after.. > > Cheers, > Diego > > > From: java-user@lucene.apache.org At: 10/12/20 13:02:46To: > java-user@lucene.apache.org > Subject: Re: Dedu

Re: Deduplication of search result with custom with custom sort

2020-10-12 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
let's say 2000 without grouping and then do the deduping after.. Cheers, Diego From: java-user@lucene.apache.org At: 10/12/20 13:02:46To: java-user@lucene.apache.org Subject: Re: Deduplication of search result with custom with custom sort Thank you very much for helping! There isn't muc

Re: Deduplication of search result with custom with custom sort

2020-10-12 Thread Dmitry Emets
about the use case? > > There might be another way to achieve the same result. > > > > What are these documents? > > Why do you need 1000 docs per user? > > > > > > From: java-user@lucene.apache.org At: 10/09/20 14:25:02To: > > java-user@lucene.apache.org &

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Jigar Shah
result. > > What are these documents? > Why do you need 1000 docs per user? > > > From: java-user@lucene.apache.org At: 10/09/20 14:25:02To: > java-user@lucene.apache.org > Subject: Re: Deduplication of search result with custom with custom sort > > 6_500_000 is the to

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
: Deduplication of search result with custom with custom sort 6_500_000 is the total count of groups in the entire collection. I only return the top 1000 to users. I use Lucene where I have documents that can have the same docvalue, and I want to deduplicate this documents by this docvalue during search. Also

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
6_500_000 is the total count of groups in the entire collection. I only return the top 1000 to users. I use Lucene where I have documents that can have the same docvalue, and I want to deduplicate this documents by this docvalue during search. Also, i sort my documents by multiple fields and becaus

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Erick Erickson
This is going to be fairly painful. You need to keep a list 6.5M items long, sorted. Before diving in there, I’d really back up and ask what the use-case is. Returning 6.5M docs to a user is useless, so are you’re doing some kind of analytics maybe? In which case, and again assuming you’re using S

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
I have 12_000_000 documents, 6_500_000 groups With sort: It takes around 1 sec without grouping, 2 sec with grouping and 12 sec with setAllGroups(true) Without sort: It takes around 0.2 sec without grouping, 0.6 sec with grouping and 10 sec with setAllGroups(true) Thank you, Erick, I will look in

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Erick Erickson
At the Solr level, CollapsingQParserPlugin see: https://lucene.apache.org/solr/guide/8_6/collapse-and-expand-results.html You could perhaps steal some ideas from that if you need this at the Lucene level. Best, Erick > On Oct 9, 2020, at 7:25 AM, Diego Ceccarelli (BLOOMBERG/ LONDON) > wrote: >

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
@lucene.apache.org Subject: Re: Deduplication of search result with custom with custom sort Yes, it is пт, 9 окт. 2020 г. в 14:25, Diego Ceccarelli (BLOOMBERG/ LONDON) < dceccarel...@bloomberg.net>: > Is the field that you are using to dedupe stored as a docvalue? > > From: java-user@lucene.ap

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
Yes, it is пт, 9 окт. 2020 г. в 14:25, Diego Ceccarelli (BLOOMBERG/ LONDON) < dceccarel...@bloomberg.net>: > Is the field that you are using to dedupe stored as a docvalue? > > From: java-user@lucene.apache.org At: 10/09/20 12:18:04To: > java-user@lucene.apache.org > Subject: Deduplication of sea