Re: Update / replication of offline indexes
Thanks Erick and Upayavira! This answers my question. On Mon, Dec 17, 2012 at 8:05 AM, Erick Erickson erickerick...@gmail.comwrote: See the very last line here: http://wiki.apache.org/solr/MergingSolrIndexes Short answer is that merging will lead to duplicate documents, even with uniqueKeys defined. So you're really kind of stuck handling this outside of merge, either by shipping the list of overwritten docs and deleting them from the base index or shipping the JSON/XML format and indexing those. Of the two, I'd think the latter is easiest/least prone to surprises. Especially since you could re-run the indexing as many times as necessary. The UniqueKey bits are only guaranteed to overwrite older docs when indexing, not merging. Best Erick On Thu, Dec 13, 2012 at 3:17 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Alex, You got my point right. What I see is merge adds duplicate document. Is there a way to overwrite existing document in one core by another. Can merge operation lead to data corruption, say in case when the core on client had uncommitted changes. What would be a better solution for my requirement, merge or indexing XML/JSON? Regards, Dikchant On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Not sure I fully understood this and maybe you already cover that by 'merge', but if you know what you gave the client last time, you can just build a differential as a second core, then on client mount that second core and merge it into the first one (e.g. with DIH). Just a thought. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com wrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can
Re: Update / replication of offline indexes
See the very last line here: http://wiki.apache.org/solr/MergingSolrIndexes Short answer is that merging will lead to duplicate documents, even with uniqueKeys defined. So you're really kind of stuck handling this outside of merge, either by shipping the list of overwritten docs and deleting them from the base index or shipping the JSON/XML format and indexing those. Of the two, I'd think the latter is easiest/least prone to surprises. Especially since you could re-run the indexing as many times as necessary. The UniqueKey bits are only guaranteed to overwrite older docs when indexing, not merging. Best Erick On Thu, Dec 13, 2012 at 3:17 PM, Dikchant Sahi contacts...@gmail.comwrote: Hi Alex, You got my point right. What I see is merge adds duplicate document. Is there a way to overwrite existing document in one core by another. Can merge operation lead to data corruption, say in case when the core on client had uncommitted changes. What would be a better solution for my requirement, merge or indexing XML/JSON? Regards, Dikchant On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Not sure I fully understood this and maybe you already cover that by 'merge', but if you know what you gave the client last time, you can just build a differential as a second core, then on client mount that second core and merge it into the first one (e.g. with DIH). Just a thought. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com wrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach
Re: Update / replication of offline indexes
I guess without knowing more about the usecase, it is difficult to see whether it is best to ship pre-prepared indexes or indexable content. Certainly the latter would be far simpler, and more in-keeping with the way Solr is typically used, and personally I'd start with that. Thinking through what you're saying - clients may update at any time - i.e. they won't all be forced to accept every update on every occasion - you will loose much ability to ship partial indexes. As segments get merged over time, you'd need to ship partial indexes against all of the possible states that might exist out there, and that would simply be prohibitive. Upayavira On Fri, Dec 14, 2012, at 05:52 AM, Dikchant Sahi wrote: Yes, we have an uniqueId defined but merge adds two documents with the same id. As per my understanding this is how Solr behaves. Correct me if am wrong. On Fri, Dec 14, 2012 at 2:25 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Do you have IDs defined? How do you expect Sold to know they are duplicate records? Maybe the issue is there somewhere. Regards, Alex On 13 Dec 2012 15:17, Dikchant Sahi contacts...@gmail.com wrote: Hi Alex, You got my point right. What I see is merge adds duplicate document. Is there a way to overwrite existing document in one core by another. Can merge operation lead to data corruption, say in case when the core on client had uncommitted changes. What would be a better solution for my requirement, merge or indexing XML/JSON? Regards, Dikchant On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Not sure I fully understood this and maybe you already cover that by 'merge', but if you know what you gave the client last time, you can just build a differential as a second core, then on client mount that second core and merge it into the first one (e.g. with DIH). Just a thought. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com wrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant
Re: Update / replication of offline indexes
Not sure I fully understood this and maybe you already cover that by 'merge', but if you know what you gave the client last time, you can just build a differential as a second core, then on client mount that second core and merge it into the first one (e.g. with DIH). Just a thought. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.comwrote: Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com wrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better approach. Thanks a ton! Regards, Dikchant -- Walter Underwood wun...@wunderwood.org
Re: Update / replication of offline indexes
Hi Alex, You got my point right. What I see is merge adds duplicate document. Is there a way to overwrite existing document in one core by another. Can merge operation lead to data corruption, say in case when the core on client had uncommitted changes. What would be a better solution for my requirement, merge or indexing XML/JSON? Regards, Dikchant On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Not sure I fully understood this and maybe you already cover that by 'merge', but if you know what you gave the client last time, you can just build a differential as a second core, then on client mount that second core and merge it into the first one (e.g. with DIH). Just a thought. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com wrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better
Re: Update / replication of offline indexes
Do you have IDs defined? How do you expect Sold to know they are duplicate records? Maybe the issue is there somewhere. Regards, Alex On 13 Dec 2012 15:17, Dikchant Sahi contacts...@gmail.com wrote: Hi Alex, You got my point right. What I see is merge adds duplicate document. Is there a way to overwrite existing document in one core by another. Can merge operation lead to data corruption, say in case when the core on client had uncommitted changes. What would be a better solution for my requirement, merge or indexing XML/JSON? Regards, Dikchant On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Not sure I fully understood this and maybe you already cover that by 'merge', but if you know what you gave the client last time, you can just build a differential as a second core, then on client mount that second core and merge it into the first one (e.g. with DIH). Just a thought. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com wrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted.
Re: Update / replication of offline indexes
Yes, we have an uniqueId defined but merge adds two documents with the same id. As per my understanding this is how Solr behaves. Correct me if am wrong. On Fri, Dec 14, 2012 at 2:25 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Do you have IDs defined? How do you expect Sold to know they are duplicate records? Maybe the issue is there somewhere. Regards, Alex On 13 Dec 2012 15:17, Dikchant Sahi contacts...@gmail.com wrote: Hi Alex, You got my point right. What I see is merge adds duplicate document. Is there a way to overwrite existing document in one core by another. Can merge operation lead to data corruption, say in case when the core on client had uncommitted changes. What would be a better solution for my requirement, merge or indexing XML/JSON? Regards, Dikchant On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Not sure I fully understood this and maybe you already cover that by 'merge', but if you know what you gave the client last time, you can just build a differential as a second core, then on client mount that second core and merge it into the first one (e.g. with DIH). Just a thought. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com wrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches.
Re: Update / replication of offline indexes
This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.comwrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better approach. Thanks a ton! Regards, Dikchant -- Walter Underwood wun...@wunderwood.org
Re: Update / replication of offline indexes
Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.comwrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better approach. Thanks a ton! Regards, Dikchant -- Walter Underwood wun...@wunderwood.org
Re: Update / replication of offline indexes
You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better approach. Thanks a ton! Regards, Dikchant -- Walter Underwood wun...@wunderwood.org
Re: Update / replication of offline indexes
Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.orgwrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better approach. Thanks a ton! Regards, Dikchant -- Walter Underwood wun...@wunderwood.org