Near real time search in Lucene 7.2.0

2018-03-06 Thread Kumar, Santosh
Hi All,
I am new to Lucene API and need help with below issues:

  *   How to achieve near real time search in Lucene v 7.2.0. ?  I have seen 
examples of having one indexWriter open for entire application life cycle and 
invoking indexWriter. getReader() and reader.reopen(). But, these no longer 
seem to be working in 7.2.0.  Any examples of NRT in Lucene V7.2.0?
  *   How to prevent the write exception: “Lock held by this virtual machine: 
${index file path}/write.lock” ?

Thank you for the help!!!

Thank you & Regards,
Santosh


Re: Storing and retrieving Java objects in Lucene

2018-02-19 Thread Kumar, Santosh
Hi Ganesh,  Thank you for quick response.
Most of these documents will have more than 10 fields. In some cases there is 
also a foreign key relationship. Will consider JSON. Also need to consider 
performance factor.

Thank you and Regards,
Santosh


On 20/02/18, 10:00 AM, "ganesh m" <emailg...@yahoo.co.in> wrote:

Hi Santosh

 >>Furthermore converting the Lucene Documents to Java object and vice- 
versa is a tedious task.

This should not be tedious, how big your document is?

One suggestion is to convert your Java object to JSON and store it in 
Lucene. You need to retrieve one field and you can easily convert back 
to object.

Regards
Ganesh

    On 20-02-2018 08:34, Kumar, Santosh wrote:
> Hi,
>
> I have a requirement to store a Java object with multiple fields into the 
Lucene index. Basically, at the application startup I run a  select query on 
entities ( there are 5 of them as of now and may increase in future) and then 
create an index for each of these entities (5) i.e. five different indexes as 
of now(cannot have a common index. Need separation of entity data).   Ideally I 
would have liked to store only primary key field, but I need rest of the fields 
upon fetch.
> I use this index(basically only the primary key field) to prevent users 
from creating duplicate entities or suggest them like a Did you mean(Google) ? 
feature . For this purpose, I’m using SpellChecker module to suggest entities 
or identify duplicates. Since, Spell checker only returns a String array, I 
again have to run a select separate search on the index(QueryParser search) or 
run select on the DB to fetch the entire object. Furthermore converting the 
Lucene Documents to Java object and vice- versa is a tedious task. Is there any 
API or library that can simplify this task ?  I have heard of Compass API, but 
not sure if it is still recommended. Any examples of the same or APIs will be 
appreciated. Thank you !!!
>
>
> Thank you and Regards,
> Santosh





Storing and retrieving Java objects in Lucene

2018-02-19 Thread Kumar, Santosh
Hi,

I have a requirement to store a Java object with multiple fields into the 
Lucene index. Basically, at the application startup I run a  select query on 
entities ( there are 5 of them as of now and may increase in future) and then 
create an index for each of these entities (5) i.e. five different indexes as 
of now(cannot have a common index. Need separation of entity data).   Ideally I 
would have liked to store only primary key field, but I need rest of the fields 
upon fetch.
I use this index(basically only the primary key field) to prevent users from 
creating duplicate entities or suggest them like a Did you mean(Google) ? 
feature . For this purpose, I’m using SpellChecker module to suggest entities 
or identify duplicates. Since, Spell checker only returns a String array, I 
again have to run a select separate search on the index(QueryParser search) or 
run select on the DB to fetch the entire object. Furthermore converting the 
Lucene Documents to Java object and vice- versa is a tedious task. Is there any 
API or library that can simplify this task ?  I have heard of Compass API, but 
not sure if it is still recommended. Any examples of the same or APIs will be 
appreciated. Thank you !!!


Thank you and Regards,
Santosh


Re: Lucene with Database

2017-12-28 Thread Kumar, Santosh
Basically, I need indexing only for fuzzy search on entities. So, I’m thinking 
to create Index out of DB tables (for the search term) and store it on server 
(cloud foundry, yet to figure out how to achieve this). Now whenever, a user 
creates/updates/deletes any entity(es), I would like to perform real time 
update on the index as well. This is mandatory and helps in preventing 
duplicate entities based on fuzzy search (for ex: slsOrd, SalesOrder etc… are 
considered same).

Thank you for pointing at Solr will give it a try as well.

On 28/12/17, 1:22 PM, "Riccardo Tasso" <riccardo.ta...@gmail.com> wrote:

2017-12-28 6:35 GMT+01:00 Kumar, Santosh <santosh.kuma...@sap.com>:
>
> While looking up for examples of fuzzy search with Lucene, I came across
> examples that demonstrate Lucene with file system predominantly, so was
> wondering if there are any samples on ‘How to use Lucene with DB’ or if 
the
> Java logic remains same for Filesystem or DB (really sorry I am new to
> Lucene). Any differences or things to consider when the data source are
> different?


If we are speaking of indexing documents from db or from filesystem, it is
the same thing.
If you are thinking about a database for storing lucene data structure,
instead of filesystem which is the default option, I will discourage you.
The filesystem storage is the one officially supported.

Since it's your first time with lucene, have you considered something like
Solr or Elasticsearch, which offers you more functionalities without the
need of implementing them?

Riccardo




Re: Lucene with Database

2017-12-27 Thread Kumar, Santosh
Hi Trejkaz, Evert, Riccardo,

Thank you for your inputs. We have an application which we plan to migrate to 
Cloudfoundry and are yet to make a decision on DataBase with the contenders 
being PostgreSQL, MySQL, HANA DB, MongoDB. In the current setup, we use HANA DB 
which already has a fuzzy search query. But, when we migrate to Cloudfoundry we 
might use a different database and to keep fuzzy search DB agnostic, I think it 
would be better to have fuzzy search in Java layer rather than in DB layer.

While looking up for examples of fuzzy search with Lucene, I came across 
examples that demonstrate Lucene with file system predominantly, so was 
wondering if there are any samples on ‘How to use Lucene with DB’ or if the 
Java logic remains same for Filesystem or DB (really sorry I am new to Lucene). 
Any differences or things to consider when the data source are different?

Thank you and Regards,
Santosh
 
On 28/12/17, 4:01 AM, "Trejkaz"  wrote:

On Thu, Dec 28, 2017 at 1:07 AM, Riccardo Tasso
 wrote:
> Hi,
> I am not aware of any lucene integration with rdbms

Derby has a plugin of some sort. I haven't tried it so I have no idea
what it actually does, but it looks like it adds table functions which
you could join to other queries.

https://db.apache.org/derby/docs/10.13/tools/rtoolsoptlucene.html

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





Lucene with Database

2017-12-21 Thread Kumar, Santosh

Hi,
I’m currently working on project which has the following scenario:


  1.  I have entities in DB on which I would like to prevent duplicates by same 
name or near match, for example, SalesOrder or SlsOrd or SalesOrd etc…are all 
considered same. For this, I would like to use fuzzy search and return only 
entities depending on a matching criterion (say, return entities with match 
>=60%).
  2.  How do I approach this use case? Should I create one index (IndexWriter 
with RAMDirectory?) for the entire application and keep updating the index (in 
the background as a separate micro service) and whenever, a new entity is 
created or updated or removed (I need real time updates, can’t wait for bulk 
updates on index) update the index as well?
  3.  I can then use the index created above as lookup when a user tries to 
create a new entity and generate error or warning message.

If the 2nd point above is fine, then is there any general guideline or example 
that I can follow for creating a global index for the application? Also, is 
there any guideline for using Lucene with Database.

Appreciate your help!!!

Thank you and Regards,
Santosh