Hi guys, I think I need to simplify my question. After reading it one more time, I realized I touched many things, and it seem confusing.
It seems like if we index the same document twice, a new document is created. And as per http://lucy.apache.org/docs/c/Lucy/Docs/DocIDs.html, " If you truly need a primary key field, you must define it and populate it yourself". How can we do this, are there any examples around this? Should I search for the document with the primary key before indexing and if it exists, should I not index it? Thanks, Serkan On Tue, Nov 15, 2016 at 2:22 PM, Serkan Mulayim <serkanmula...@gmail.com> wrote: > Hi, > > As far as I see if we add the same document twice, it creates a new > document. As per http://lucy.apache.org/docs/c/Lucy/Docs/DocIDs.html, " If > you truly need a primary key field, you must define it and populate it > yourself". Can you please elaborate on this one? Does it mean choosing a > field to be primary key and delete the document with the primary key and > re-add it? If so the document might have not been created until we commit, > so deletion would not be possible, right? Also performance would be another > issue. > > Another solution might be hashing the "primary key" and put it as the > documentId (but the referred page also says that docIds are ephemeral). If > the ephemeralness of the docId is not a problem, my concern is regarding > the collisions considering that I might need to have many documents in the > same index. This boils down to the birthday problem and we might not be > safe in the range of an integer. > > Do you have any suggestions about this one? > > Thanks, > Serkan >