[ANNOUNCE] Apache Lucene 8.3.1 released

2019-12-03 Thread Ishan Chattopadhyaya
## 3 December 2019, Apache Lucene™ 8.3.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 8.3.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that

Need suggestions on implementing a custom query (offload R-tree filter to fully in-memory) on Lucene-8.3

2019-12-03 Thread 小鱼儿
Background: i need to implement a document indexing and search for POIs(point of interest) under LBS scene. A POI has name, address, and location(LatLonPoint), and i want to combine a text query with a geo-spatial 2d range filter. The problem is, when i first build a native in-memory index which

Re: Multi-IDF for a single term possible?

2019-12-03 Thread Ameer Albahem
IDF is a simple measure to calculate. So, if building a separate index for each user is not an ideal solution, then I suggest you could try to calculate these statistics upfront. Just maintain these statistics for each user, then use them in the query process. As the search time, you use these

Re: Multi-IDF for a single term possible?

2019-12-03 Thread Ravikumar Govindarajan
> > it is enough to give each its own field. > I kind of over-simplified the problem at hand. Apologies. DOC_TYPE is just one aspect of the problem. The other one is that, it is actually shared index where there are multiple-users (100-3000 users per index). There are many hundreds of such

Re:Multi-IDF for a single term possible?

2019-12-03 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Ravi, Can you give more details on how you store an entity into lucene? what is a doc type? what fields do you have? Cheers From: java-user@lucene.apache.org At: 12/03/19 12:50:40To: java-user@lucene.apache.org Subject: Multi-IDF for a single term possible? Hello, We are using TF-IDF

Re: Multi-IDF for a single term possible?

2019-12-03 Thread Robert Muir
it is enough to give each its own field. On Tue, Dec 3, 2019 at 7:57 AM Adrien Grand wrote: > Is there any reason why you are not storing each DOC_TYPE in its own index? > > On Tue, Dec 3, 2019 at 1:50 PM Ravikumar Govindarajan > wrote: > > > > Hello, > > > > We are using TF-IDF for scoring

Re: Multi-IDF for a single term possible?

2019-12-03 Thread Adrien Grand
Is there any reason why you are not storing each DOC_TYPE in its own index? On Tue, Dec 3, 2019 at 1:50 PM Ravikumar Govindarajan wrote: > > Hello, > > We are using TF-IDF for scoring (Yet to migrate to BM25). Different > entities (DOC_TYPES) are crunched & stored together in a single index. > >

Multi-IDF for a single term possible?

2019-12-03 Thread Ravikumar Govindarajan
Hello, We are using TF-IDF for scoring (Yet to migrate to BM25). Different entities (DOC_TYPES) are crunched & stored together in a single index. When it comes to IDF, I find that there is a single value computed across documents & stored as part of TermStats, whereas our documents are not