Re: Duplicate docs when merging indices?

2010-08-22 Thread Shalin Shekhar Mangar
On Sat, Aug 21, 2010 at 5:56 PM, Andrew Clegg andrew.cl...@gmail.comwrote:


 Hi,

 First off, sorry about previous accidental post, had a sausage-fingered
 moment.

 Anyway...

 If I merge two indices with CoreAdmin, as detailed here...

 http://wiki.apache.org/solr/MergingSolrIndexes

 What happens to duplicate documents between the two? i.e. those that have
 the same unique key.

 What decides which copy takes precedence? Will documents get indexed
 multiple times, or will the second one just get skipped?

 Also, does the behaviour vary between CoreAdmin and IndexMergeTool? This
 thread from a couple of years ago:

 http://web.archiveorange.com/archive/v/AAfXfQIiBU7vyQBt6qdk

 suggests that IndexMergeTool can result in dupes, unless I'm
 misinterpreting.


Yes, it will result in duplicate docs. CoreAdmin and IndexMergeTool both use
the IndexWriter#addIndexes method so the behavior will be same.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Duplicate docs when merging indices?

2010-08-21 Thread Gora Mohanty
On Sat, 21 Aug 2010 05:26:59 -0700 (PDT)
Andrew Clegg andrew.cl...@gmail.com wrote:
[...]
 If I merge two indices with CoreAdmin, as detailed here...
 
 http://wiki.apache.org/solr/MergingSolrIndexes
 
 What happens to duplicate documents between the two? i.e. those
 that have the same unique key.
 
 What decides which copy takes precedence? Will documents get
 indexed multiple times, or will the second one just get skipped?
[...]

Have not used CoreAdmin, but with MergeTool, know from personal
experience that there would be duplicates created. I imagine
that the same is the case for CoreAdmin as Solr/Lucene allows
duplicate IDs.

Regards,
Gora


Re: Merging Indices

2008-12-05 Thread Shalin Shekhar Mangar
On Fri, Dec 5, 2008 at 5:09 AM, ashokc [EMAIL PROTECTED] wrote:


 The SOLR wiki says

 3. Make sure both indexes you want to merge are closed.

 What exactly does 'closed' mean?


I think that would mean that the IndexReader and IndexWriter on that index
are closed.

1. Do I need to stop SOLR search on both indexes before running the merge
 command? So a brief downtime is required?


I think so.


 Or do I simply prevent any 'updates/deletes' to these indices during the
 merge time so they can still serve up results (read only?) while I am
 creating a new merged index?

 2. Before the new index replaces the old index, do I need to stop SOLR for
 that instance? Or can I simply move the old index out and place the new
 index in the same place, without having to stop SOLR


The rsync based replication in Solr uses similar schema. It creates
hardlinks to the new index files over the old ones.


 3. If SOLR has to be stopped during the merge operation, can we work with a
 redundant/failover instance and stagger the merge so the search service
 will
 not go down? Any guidelines here are welcome.


It is not very clear as to what you are actually trying to do. Why do you
even need to merge indices? Are you creating your index outside of Solr?
Just curious to know your use-case.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Merging Indices

2008-12-05 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 6:39 PM, ashokc [EMAIL PROTECTED] wrote:

 The SOLR wiki says

3. Make sure both indexes you want to merge are closed.

 What exactly does 'closed' mean?

If you do a commit, and then prevent updates, the index should be
closed (no open IndexWriter).

 1. Do I need to stop SOLR search on both indexes before running the merge
 command? So a brief downtime is required?
 Or do I simply prevent any 'updates/deletes' to these indices during the
 merge time so they can still serve up results (read only?) while I am
 creating a new merged index?

Preventing updates/deletes should be sufficient.

 2. Before the new index replaces the old index, do I need to stop SOLR for
 that instance? Or can I simply move the old index out and place the new
 index in the same place, without having to stop SOLR

Yes, simply moving the index should work if you are careful to avoid
any updates since the last commit.

 3. If SOLR has to be stopped during the merge operation, can we work with a
 redundant/failover instance and stagger the merge so the search service will
 not go down? Any guidelines here are welcome.

 Thanks

 - ashok
 --
 View this message in context: 
 http://www.nabble.com/Merging-Indices-tp20845009p20845009.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Merging Indices

2008-12-05 Thread ashokc

Thanks for the help Yonik  Shalin.It really makes it easy for me if I do not
have to stop/start the SOLR app during the merge operations.

The reason I have to do this many times a day, is that I am implementing a
simple-minded entity-extraction procedure for the content I am indexing. I
have a user defined taxonomy into which the current documents, and any new
documents should be classified under. The taxonomy defines the nested facet
fields for SOLR. When a new document is posted, the user expects to have it
available in the right facet right away. My classification procedure is as
follows when a new document is added.

1. Create a new temporary index with that document (no taxonomy fields at
this time)
2. Search this index with each of the taxonomy terms (synonyms are employed
as well through synonyms.txt) and find out which of these categories is a
hit for this document.
3. Add a new field ... line into the document for each category that is a
match for this document.
4. Repost this updated document.

Now I have a new index that facets this document, the same the the big index
does.

5. I merge these two indices now so that the new document also part of the
big index.

6. Delete the temporary index

The reason for a new temporary index is that, the step 2 is A LOT quicker
with a single (or a handful) document. If I simply posted this new doc, into
the big index, and then tried to classify it, this search will take a while.
I have over 200 nested taxonomy fields to search over.

Are there better approaches?

Thanks

- ashok



Yonik Seeley wrote:
 
 On Thu, Dec 4, 2008 at 6:39 PM, ashokc [EMAIL PROTECTED] wrote:

 The SOLR wiki says

3. Make sure both indexes you want to merge are closed.

 What exactly does 'closed' mean?
 
 If you do a commit, and then prevent updates, the index should be
 closed (no open IndexWriter).
 
 1. Do I need to stop SOLR search on both indexes before running the merge
 command? So a brief downtime is required?
 Or do I simply prevent any 'updates/deletes' to these indices during the
 merge time so they can still serve up results (read only?) while I am
 creating a new merged index?
 
 Preventing updates/deletes should be sufficient.
 
 2. Before the new index replaces the old index, do I need to stop SOLR
 for
 that instance? Or can I simply move the old index out and place the new
 index in the same place, without having to stop SOLR
 
 Yes, simply moving the index should work if you are careful to avoid
 any updates since the last commit.
 
 3. If SOLR has to be stopped during the merge operation, can we work with
 a
 redundant/failover instance and stagger the merge so the search service
 will
 not go down? Any guidelines here are welcome.

 Thanks

 - ashok
 --
 View this message in context:
 http://www.nabble.com/Merging-Indices-tp20845009p20845009.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Merging-Indices-tp20845009p20859513.html
Sent from the Solr - User mailing list archive at Nabble.com.



Merging Indices

2008-12-04 Thread ashokc

The SOLR wiki says

3. Make sure both indexes you want to merge are closed.

What exactly does 'closed' mean?

1. Do I need to stop SOLR search on both indexes before running the merge
command? So a brief downtime is required?
Or do I simply prevent any 'updates/deletes' to these indices during the
merge time so they can still serve up results (read only?) while I am
creating a new merged index?

2. Before the new index replaces the old index, do I need to stop SOLR for
that instance? Or can I simply move the old index out and place the new
index in the same place, without having to stop SOLR

3. If SOLR has to be stopped during the merge operation, can we work with a
redundant/failover instance and stagger the merge so the search service will
not go down? Any guidelines here are welcome.

Thanks

- ashok
-- 
View this message in context: 
http://www.nabble.com/Merging-Indices-tp20845009p20845009.html
Sent from the Solr - User mailing list archive at Nabble.com.