Re: Duplicate docs when merging indices?
On Sat, Aug 21, 2010 at 5:56 PM, Andrew Clegg andrew.cl...@gmail.comwrote: Hi, First off, sorry about previous accidental post, had a sausage-fingered moment. Anyway... If I merge two indices with CoreAdmin, as detailed here... http://wiki.apache.org/solr/MergingSolrIndexes What happens to duplicate documents between the two? i.e. those that have the same unique key. What decides which copy takes precedence? Will documents get indexed multiple times, or will the second one just get skipped? Also, does the behaviour vary between CoreAdmin and IndexMergeTool? This thread from a couple of years ago: http://web.archiveorange.com/archive/v/AAfXfQIiBU7vyQBt6qdk suggests that IndexMergeTool can result in dupes, unless I'm misinterpreting. Yes, it will result in duplicate docs. CoreAdmin and IndexMergeTool both use the IndexWriter#addIndexes method so the behavior will be same. -- Regards, Shalin Shekhar Mangar.
Re: Duplicate docs when merging indices?
On Sat, 21 Aug 2010 05:26:59 -0700 (PDT) Andrew Clegg andrew.cl...@gmail.com wrote: [...] If I merge two indices with CoreAdmin, as detailed here... http://wiki.apache.org/solr/MergingSolrIndexes What happens to duplicate documents between the two? i.e. those that have the same unique key. What decides which copy takes precedence? Will documents get indexed multiple times, or will the second one just get skipped? [...] Have not used CoreAdmin, but with MergeTool, know from personal experience that there would be duplicates created. I imagine that the same is the case for CoreAdmin as Solr/Lucene allows duplicate IDs. Regards, Gora
Re: Merging Indices
On Fri, Dec 5, 2008 at 5:09 AM, ashokc [EMAIL PROTECTED] wrote: The SOLR wiki says 3. Make sure both indexes you want to merge are closed. What exactly does 'closed' mean? I think that would mean that the IndexReader and IndexWriter on that index are closed. 1. Do I need to stop SOLR search on both indexes before running the merge command? So a brief downtime is required? I think so. Or do I simply prevent any 'updates/deletes' to these indices during the merge time so they can still serve up results (read only?) while I am creating a new merged index? 2. Before the new index replaces the old index, do I need to stop SOLR for that instance? Or can I simply move the old index out and place the new index in the same place, without having to stop SOLR The rsync based replication in Solr uses similar schema. It creates hardlinks to the new index files over the old ones. 3. If SOLR has to be stopped during the merge operation, can we work with a redundant/failover instance and stagger the merge so the search service will not go down? Any guidelines here are welcome. It is not very clear as to what you are actually trying to do. Why do you even need to merge indices? Are you creating your index outside of Solr? Just curious to know your use-case. -- Regards, Shalin Shekhar Mangar.
Re: Merging Indices
On Thu, Dec 4, 2008 at 6:39 PM, ashokc [EMAIL PROTECTED] wrote: The SOLR wiki says 3. Make sure both indexes you want to merge are closed. What exactly does 'closed' mean? If you do a commit, and then prevent updates, the index should be closed (no open IndexWriter). 1. Do I need to stop SOLR search on both indexes before running the merge command? So a brief downtime is required? Or do I simply prevent any 'updates/deletes' to these indices during the merge time so they can still serve up results (read only?) while I am creating a new merged index? Preventing updates/deletes should be sufficient. 2. Before the new index replaces the old index, do I need to stop SOLR for that instance? Or can I simply move the old index out and place the new index in the same place, without having to stop SOLR Yes, simply moving the index should work if you are careful to avoid any updates since the last commit. 3. If SOLR has to be stopped during the merge operation, can we work with a redundant/failover instance and stagger the merge so the search service will not go down? Any guidelines here are welcome. Thanks - ashok -- View this message in context: http://www.nabble.com/Merging-Indices-tp20845009p20845009.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Merging Indices
Thanks for the help Yonik Shalin.It really makes it easy for me if I do not have to stop/start the SOLR app during the merge operations. The reason I have to do this many times a day, is that I am implementing a simple-minded entity-extraction procedure for the content I am indexing. I have a user defined taxonomy into which the current documents, and any new documents should be classified under. The taxonomy defines the nested facet fields for SOLR. When a new document is posted, the user expects to have it available in the right facet right away. My classification procedure is as follows when a new document is added. 1. Create a new temporary index with that document (no taxonomy fields at this time) 2. Search this index with each of the taxonomy terms (synonyms are employed as well through synonyms.txt) and find out which of these categories is a hit for this document. 3. Add a new field ... line into the document for each category that is a match for this document. 4. Repost this updated document. Now I have a new index that facets this document, the same the the big index does. 5. I merge these two indices now so that the new document also part of the big index. 6. Delete the temporary index The reason for a new temporary index is that, the step 2 is A LOT quicker with a single (or a handful) document. If I simply posted this new doc, into the big index, and then tried to classify it, this search will take a while. I have over 200 nested taxonomy fields to search over. Are there better approaches? Thanks - ashok Yonik Seeley wrote: On Thu, Dec 4, 2008 at 6:39 PM, ashokc [EMAIL PROTECTED] wrote: The SOLR wiki says 3. Make sure both indexes you want to merge are closed. What exactly does 'closed' mean? If you do a commit, and then prevent updates, the index should be closed (no open IndexWriter). 1. Do I need to stop SOLR search on both indexes before running the merge command? So a brief downtime is required? Or do I simply prevent any 'updates/deletes' to these indices during the merge time so they can still serve up results (read only?) while I am creating a new merged index? Preventing updates/deletes should be sufficient. 2. Before the new index replaces the old index, do I need to stop SOLR for that instance? Or can I simply move the old index out and place the new index in the same place, without having to stop SOLR Yes, simply moving the index should work if you are careful to avoid any updates since the last commit. 3. If SOLR has to be stopped during the merge operation, can we work with a redundant/failover instance and stagger the merge so the search service will not go down? Any guidelines here are welcome. Thanks - ashok -- View this message in context: http://www.nabble.com/Merging-Indices-tp20845009p20845009.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Merging-Indices-tp20845009p20859513.html Sent from the Solr - User mailing list archive at Nabble.com.
Merging Indices
The SOLR wiki says 3. Make sure both indexes you want to merge are closed. What exactly does 'closed' mean? 1. Do I need to stop SOLR search on both indexes before running the merge command? So a brief downtime is required? Or do I simply prevent any 'updates/deletes' to these indices during the merge time so they can still serve up results (read only?) while I am creating a new merged index? 2. Before the new index replaces the old index, do I need to stop SOLR for that instance? Or can I simply move the old index out and place the new index in the same place, without having to stop SOLR 3. If SOLR has to be stopped during the merge operation, can we work with a redundant/failover instance and stagger the merge so the search service will not go down? Any guidelines here are welcome. Thanks - ashok -- View this message in context: http://www.nabble.com/Merging-Indices-tp20845009p20845009.html Sent from the Solr - User mailing list archive at Nabble.com.