Re: [lucy-user] Re: Regarding document Ids

Peter Karman Wed, 16 Nov 2016 11:23:58 -0800

Serkan Mulayim wrote on 11/16/16, 1:17 PM:

Hi guys,


I think I need to simplify my question. After reading it one more time, I
realized I touched many things, and it seem confusing.

It seems like if we index the same document twice, a new document is
created. And as per http://lucy.apache.org/docs/c/Lucy/Docs/DocIDs.html, " If
you truly need a primary key field, you must define it and populate it
yourself". How can we do this, are there any examples around this? Should I
search for the document with the primary key before indexing and if it
exists, should I not index it?


What I do in all my apps is use delete_by_term
https://metacpan.org/pod/distribution/Lucy/lib/Lucy/Index/Indexer.pod#delete_by_term

I have my own primary key system that varies based on the application. Sometimesit is a URI, sometimes a db PK. I maintain the document integrity myself.


One example from how Dezi solves this more generally:

https://metacpan.org/source/KARMAN/Dezi-App-0.014/lib/Dezi/Lucy/Indexer.pm#L451

Lucy isn't a RDBMS. It just tokenizes the fields you shove into it, andretrieves very quickly.



--
Peter Karman  .  http://peknet.com/  .  pe...@peknet.com

Re: [lucy-user] Re: Regarding document Ids

Reply via email to