Re: Iterating Over All Documents On a Changing Index
Thanks for the clarification. I have written my own logic tracking changes and ignoring documents that have been written or deleted since the reindex started. On Mon, Oct 21, 2019, 4:58 PM Adrien Grand wrote: > This is the right place to ask these questions indeed. > > This is a good way to iterate over documents. Regarding your 2nd > question, Lucene IndexReaders are point-in-time views of the data, so > changes won't become visible in-place. The tricky problem with this > kind of problem is usually to deal with documents that are getting > indexed after you pulled a new reader and while you are in the process > of reindexing. > > On Sat, Oct 19, 2019 at 1:35 AM Matt Davis > wrote: > > > > Hi All, > > > > I am working on implementing of an in place reindex using Lucene. In my > > case, I have BSON document stored in a binary field and have a set of > rules > > that pull fields out of the BSON and indexes them into different Lucene > > fields with different analyzers. I would like to be able to change these > > rules / schema and then iterate over the documents, indexing them using > the > > new schema. > > > > I have come up with the following code block: > > https://gist.github.com/mdavis95/f600e0a8233d0a1232eff77645d1dc8a > > > > I have two questions: > > 1) Is this a good way to iterate over the documents > > 2) How can I manage documents changing when I am doing this. New > documents > > coming in should be fine I believe but changes to existing documents > could > > be lost if I understand correctly. > > > > I hope that this is the right place to ask this question and I apologize > if > > this is obvious or has been asked and answered. > > > > Thanks, > > Matt > > > > -- > Adrien > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Iterating Over All Documents On a Changing Index
This is the right place to ask these questions indeed. This is a good way to iterate over documents. Regarding your 2nd question, Lucene IndexReaders are point-in-time views of the data, so changes won't become visible in-place. The tricky problem with this kind of problem is usually to deal with documents that are getting indexed after you pulled a new reader and while you are in the process of reindexing. On Sat, Oct 19, 2019 at 1:35 AM Matt Davis wrote: > > Hi All, > > I am working on implementing of an in place reindex using Lucene. In my > case, I have BSON document stored in a binary field and have a set of rules > that pull fields out of the BSON and indexes them into different Lucene > fields with different analyzers. I would like to be able to change these > rules / schema and then iterate over the documents, indexing them using the > new schema. > > I have come up with the following code block: > https://gist.github.com/mdavis95/f600e0a8233d0a1232eff77645d1dc8a > > I have two questions: > 1) Is this a good way to iterate over the documents > 2) How can I manage documents changing when I am doing this. New documents > coming in should be fine I believe but changes to existing documents could > be lost if I understand correctly. > > I hope that this is the right place to ask this question and I apologize if > this is obvious or has been asked and answered. > > Thanks, > Matt -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Iterating Over All Documents On a Changing Index
Hi All, I am working on implementing of an in place reindex using Lucene. In my case, I have BSON document stored in a binary field and have a set of rules that pull fields out of the BSON and indexes them into different Lucene fields with different analyzers. I would like to be able to change these rules / schema and then iterate over the documents, indexing them using the new schema. I have come up with the following code block: https://gist.github.com/mdavis95/f600e0a8233d0a1232eff77645d1dc8a I have two questions: 1) Is this a good way to iterate over the documents 2) How can I manage documents changing when I am doing this. New documents coming in should be fine I believe but changes to existing documents could be lost if I understand correctly. I hope that this is the right place to ask this question and I apologize if this is obvious or has been asked and answered. Thanks, Matt