Hello,
We are using TF-IDF for scoring (Yet to migrate to BM25). Different
entities (DOC_TYPES) are crunched & stored together in a single index.
When it comes to IDF, I find that there is a single value computed across
documents & stored as part of TermStats, whereas our documents are not
homoge
Is there any reason why you are not storing each DOC_TYPE in its own index?
On Tue, Dec 3, 2019 at 1:50 PM Ravikumar Govindarajan
wrote:
>
> Hello,
>
> We are using TF-IDF for scoring (Yet to migrate to BM25). Different
> entities (DOC_TYPES) are crunched & stored together in a single index.
>
>
it is enough to give each its own field.
On Tue, Dec 3, 2019 at 7:57 AM Adrien Grand wrote:
> Is there any reason why you are not storing each DOC_TYPE in its own index?
>
> On Tue, Dec 3, 2019 at 1:50 PM Ravikumar Govindarajan
> wrote:
> >
> > Hello,
> >
> > We are using TF-IDF for scoring (Ye
Hi Ravi,
Can you give more details on how you store an entity into lucene? what is a doc
type?
what fields do you have?
Cheers
From: java-user@lucene.apache.org At: 12/03/19 12:50:40To:
java-user@lucene.apache.org
Subject: Multi-IDF for a single term possible?
Hello,
We are using TF-IDF f
>
> it is enough to give each its own field.
>
I kind of over-simplified the problem at hand. Apologies.
DOC_TYPE is just one aspect of the problem. The other one is that, it is
actually shared index where there are multiple-users (100-3000 users per
index). There are many hundreds of such shared
IDF is a simple measure to calculate. So, if building a separate index for
each user is not an ideal solution, then I suggest you could try to
calculate these statistics upfront. Just maintain these statistics for each
user, then use them in the query process.
As the search time, you use these sta
## 3 December 2019, Apache Lucene™ 8.3.1 available
The Lucene PMC is pleased to announce the release of Apache Lucene 8.3.1.
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires
Background: i need to implement a document indexing and search for
POIs(point of interest) under LBS scene. A POI has name, address, and
location(LatLonPoint), and i want to combine a text query with a
geo-spatial 2d range filter.
The problem is, when i first build a native in-memory index which u