Near real time search in Lucene 7.2.0
Hi All, I am new to Lucene API and need help with below issues: * How to achieve near real time search in Lucene v 7.2.0. ? I have seen examples of having one indexWriter open for entire application life cycle and invoking indexWriter. getReader() and reader.reopen(). But, these no longer seem to be working in 7.2.0. Any examples of NRT in Lucene V7.2.0? * How to prevent the write exception: “Lock held by this virtual machine: ${index file path}/write.lock” ? Thank you for the help!!! Thank you & Regards, Santosh
Re: Storing and retrieving Java objects in Lucene
Hi Ganesh, Thank you for quick response. Most of these documents will have more than 10 fields. In some cases there is also a foreign key relationship. Will consider JSON. Also need to consider performance factor. Thank you and Regards, Santosh On 20/02/18, 10:00 AM, "ganesh m" <emailg...@yahoo.co.in> wrote: Hi Santosh >>Furthermore converting the Lucene Documents to Java object and vice- versa is a tedious task. This should not be tedious, how big your document is? One suggestion is to convert your Java object to JSON and store it in Lucene. You need to retrieve one field and you can easily convert back to object. Regards Ganesh On 20-02-2018 08:34, Kumar, Santosh wrote: > Hi, > > I have a requirement to store a Java object with multiple fields into the Lucene index. Basically, at the application startup I run a select query on entities ( there are 5 of them as of now and may increase in future) and then create an index for each of these entities (5) i.e. five different indexes as of now(cannot have a common index. Need separation of entity data). Ideally I would have liked to store only primary key field, but I need rest of the fields upon fetch. > I use this index(basically only the primary key field) to prevent users from creating duplicate entities or suggest them like a Did you mean(Google) ? feature . For this purpose, I’m using SpellChecker module to suggest entities or identify duplicates. Since, Spell checker only returns a String array, I again have to run a select separate search on the index(QueryParser search) or run select on the DB to fetch the entire object. Furthermore converting the Lucene Documents to Java object and vice- versa is a tedious task. Is there any API or library that can simplify this task ? I have heard of Compass API, but not sure if it is still recommended. Any examples of the same or APIs will be appreciated. Thank you !!! > > > Thank you and Regards, > Santosh
Storing and retrieving Java objects in Lucene
Hi, I have a requirement to store a Java object with multiple fields into the Lucene index. Basically, at the application startup I run a select query on entities ( there are 5 of them as of now and may increase in future) and then create an index for each of these entities (5) i.e. five different indexes as of now(cannot have a common index. Need separation of entity data). Ideally I would have liked to store only primary key field, but I need rest of the fields upon fetch. I use this index(basically only the primary key field) to prevent users from creating duplicate entities or suggest them like a Did you mean(Google) ? feature . For this purpose, I’m using SpellChecker module to suggest entities or identify duplicates. Since, Spell checker only returns a String array, I again have to run a select separate search on the index(QueryParser search) or run select on the DB to fetch the entire object. Furthermore converting the Lucene Documents to Java object and vice- versa is a tedious task. Is there any API or library that can simplify this task ? I have heard of Compass API, but not sure if it is still recommended. Any examples of the same or APIs will be appreciated. Thank you !!! Thank you and Regards, Santosh
Re: Lucene with Database
Basically, I need indexing only for fuzzy search on entities. So, I’m thinking to create Index out of DB tables (for the search term) and store it on server (cloud foundry, yet to figure out how to achieve this). Now whenever, a user creates/updates/deletes any entity(es), I would like to perform real time update on the index as well. This is mandatory and helps in preventing duplicate entities based on fuzzy search (for ex: slsOrd, SalesOrder etc… are considered same). Thank you for pointing at Solr will give it a try as well. On 28/12/17, 1:22 PM, "Riccardo Tasso" <riccardo.ta...@gmail.com> wrote: 2017-12-28 6:35 GMT+01:00 Kumar, Santosh <santosh.kuma...@sap.com>: > > While looking up for examples of fuzzy search with Lucene, I came across > examples that demonstrate Lucene with file system predominantly, so was > wondering if there are any samples on ‘How to use Lucene with DB’ or if the > Java logic remains same for Filesystem or DB (really sorry I am new to > Lucene). Any differences or things to consider when the data source are > different? If we are speaking of indexing documents from db or from filesystem, it is the same thing. If you are thinking about a database for storing lucene data structure, instead of filesystem which is the default option, I will discourage you. The filesystem storage is the one officially supported. Since it's your first time with lucene, have you considered something like Solr or Elasticsearch, which offers you more functionalities without the need of implementing them? Riccardo
Re: Lucene with Database
Hi Trejkaz, Evert, Riccardo, Thank you for your inputs. We have an application which we plan to migrate to Cloudfoundry and are yet to make a decision on DataBase with the contenders being PostgreSQL, MySQL, HANA DB, MongoDB. In the current setup, we use HANA DB which already has a fuzzy search query. But, when we migrate to Cloudfoundry we might use a different database and to keep fuzzy search DB agnostic, I think it would be better to have fuzzy search in Java layer rather than in DB layer. While looking up for examples of fuzzy search with Lucene, I came across examples that demonstrate Lucene with file system predominantly, so was wondering if there are any samples on ‘How to use Lucene with DB’ or if the Java logic remains same for Filesystem or DB (really sorry I am new to Lucene). Any differences or things to consider when the data source are different? Thank you and Regards, Santosh On 28/12/17, 4:01 AM, "Trejkaz"wrote: On Thu, Dec 28, 2017 at 1:07 AM, Riccardo Tasso wrote: > Hi, > I am not aware of any lucene integration with rdbms Derby has a plugin of some sort. I haven't tried it so I have no idea what it actually does, but it looks like it adds table functions which you could join to other queries. https://db.apache.org/derby/docs/10.13/tools/rtoolsoptlucene.html TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Lucene with Database
Hi, I’m currently working on project which has the following scenario: 1. I have entities in DB on which I would like to prevent duplicates by same name or near match, for example, SalesOrder or SlsOrd or SalesOrd etc…are all considered same. For this, I would like to use fuzzy search and return only entities depending on a matching criterion (say, return entities with match >=60%). 2. How do I approach this use case? Should I create one index (IndexWriter with RAMDirectory?) for the entire application and keep updating the index (in the background as a separate micro service) and whenever, a new entity is created or updated or removed (I need real time updates, can’t wait for bulk updates on index) update the index as well? 3. I can then use the index created above as lookup when a user tries to create a new entity and generate error or warning message. If the 2nd point above is fine, then is there any general guideline or example that I can follow for creating a global index for the application? Also, is there any guideline for using Lucene with Database. Appreciate your help!!! Thank you and Regards, Santosh