equent and
> you are not strict on 1000, you might retrieve more let's say 2000 without
> grouping and then do the deduping after..
>
> Cheers,
> Diego
>
>
> From: java-user@lucene.apache.org At: 10/12/20 13:02:46To:
> java-user@lucene.apache.org
> Subject: Re: Dedu
let's say 2000 without grouping
and then do the deduping after..
Cheers,
Diego
From: java-user@lucene.apache.org At: 10/12/20 13:02:46To:
java-user@lucene.apache.org
Subject: Re: Deduplication of search result with custom with custom sort
Thank you very much for helping!
There isn't muc
about the use case?
> > There might be another way to achieve the same result.
> >
> > What are these documents?
> > Why do you need 1000 docs per user?
> >
> >
> > From: java-user@lucene.apache.org At: 10/09/20 14:25:02To:
> > java-user@lucene.apache.org
&
result.
>
> What are these documents?
> Why do you need 1000 docs per user?
>
>
> From: java-user@lucene.apache.org At: 10/09/20 14:25:02To:
> java-user@lucene.apache.org
> Subject: Re: Deduplication of search result with custom with custom sort
>
> 6_500_000 is the to
: Deduplication of search result with custom with custom sort
6_500_000 is the total count of groups in the entire collection. I only
return the top 1000 to users.
I use Lucene where I have documents that can have the same docvalue, and I
want to deduplicate this documents by this docvalue during search.
Also
6_500_000 is the total count of groups in the entire collection. I only
return the top 1000 to users.
I use Lucene where I have documents that can have the same docvalue, and I
want to deduplicate this documents by this docvalue during search.
Also, i sort my documents by multiple fields and becaus
This is going to be fairly painful. You need to keep a list 6.5M
items long, sorted.
Before diving in there, I’d really back up and ask what the use-case
is. Returning 6.5M docs to a user is useless, so are you’re doing
some kind of analytics maybe? In which case, and again
assuming you’re using S
I have 12_000_000 documents, 6_500_000 groups
With sort: It takes around 1 sec without grouping, 2 sec with grouping and
12 sec with setAllGroups(true)
Without sort: It takes around 0.2 sec without grouping, 0.6 sec with
grouping and 10 sec with setAllGroups(true)
Thank you, Erick, I will look in
At the Solr level, CollapsingQParserPlugin see:
https://lucene.apache.org/solr/guide/8_6/collapse-and-expand-results.html
You could perhaps steal some ideas from that if you
need this at the Lucene level.
Best,
Erick
> On Oct 9, 2020, at 7:25 AM, Diego Ceccarelli (BLOOMBERG/ LONDON)
> wrote:
>
@lucene.apache.org
Subject: Re: Deduplication of search result with custom with custom sort
Yes, it is
пт, 9 окт. 2020 г. в 14:25, Diego Ceccarelli (BLOOMBERG/ LONDON) <
dceccarel...@bloomberg.net>:
> Is the field that you are using to dedupe stored as a docvalue?
>
> From: java-user@lucene.ap
Yes, it is
пт, 9 окт. 2020 г. в 14:25, Diego Ceccarelli (BLOOMBERG/ LONDON) <
dceccarel...@bloomberg.net>:
> Is the field that you are using to dedupe stored as a docvalue?
>
> From: java-user@lucene.apache.org At: 10/09/20 12:18:04To:
> java-user@lucene.apache.org
> Subject: Deduplication of sea
11 matches
Mail list logo