Re: Update / replication of offline indexes

2012-12-17 Thread Dikchant Sahi
Thanks Erick and Upayavira! This answers my question.


On Mon, Dec 17, 2012 at 8:05 AM, Erick Erickson erickerick...@gmail.comwrote:

 See the very last line here:
 http://wiki.apache.org/solr/MergingSolrIndexes

 Short answer is that merging will lead to duplicate documents, even with
 uniqueKeys defined.

 So you're really kind of stuck handling this outside of merge, either by
 shipping the
 list of overwritten docs and deleting them from the base index or shipping
 the JSON/XML
 format and indexing those. Of the  two, I'd think the latter is
 easiest/least prone to surprises.
 Especially since you could re-run the indexing as many times as necessary.

 The UniqueKey bits are only guaranteed to overwrite older docs when
 indexing, not merging.

 Best
 Erick


 On Thu, Dec 13, 2012 at 3:17 PM, Dikchant Sahi contacts...@gmail.com
 wrote:

  Hi Alex,
 
  You got my point right. What I see is merge adds duplicate document. Is
  there a way to overwrite existing document in one core by another. Can
  merge operation lead to data corruption, say in case when the core on
  client had uncommitted changes.
 
  What would be a better solution for my requirement, merge or indexing
  XML/JSON?
 
  Regards,
  Dikchant
 
  On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   Not sure I fully understood this and maybe you already cover that by
   'merge', but if you know what you gave the client last time, you can
 just
   build a differential as a second core, then on client mount that second
   core and merge it into the first one (e.g. with DIH).
  
   Just a thought.
  
   Regards,
  Alex.
  
   Personal blog: http://blog.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
  
  
  
   On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com
   wrote:
  
Hi Erick,
   
Sorry for creating the confusion. By slave, I mean the indexes on
  client
machine will be replica of the master and in not same as the slave in
master-slave model. Below is the detail:
   
The system is being developed to support search facility on 1000s of
system, a majority of which will be offline.
   
The idea is that we will have a search system which will be sold
on subscription basis. For each of the subscriber, we will copy the
   master
index to their local machine, over a drive or CD. Now, if a
 subscriber
comes after 2 months and want the updates, we just want to provide
 the
deltas for 2 month as the volume of data is huge. For this we can
 think
   of
two approaches:
1. Fetch the documents which are less than 2 months old  in JSON
 format
from master Solr. Copy it to the subscriber machine
and index those documents. (copy through cd / memory sticks)
2. Create separate indexes for each month on our master machine. Copy
  the
indexes to the client machine and merge. Prior to merge we need to
  delete
records which the new index has, to avoid duplicates.
   
As long as the setup is new, we will copy the complete index and
  restart
Solr. We are not sure of the best approach for copying the deltas.
   
Thanks,
Dikchant
   
   
   
On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson 
  erickerick...@gmail.com
wrote:
   
 This is somewhat confusing. You say that box2 is the slave, yet
  they're
not
 connected? Then you need to copy the solr home/data index from
 box
  1
   to
 box 2 manually (I'd have box2 solr shut down at the time) and
 restart
Solr.

 Why can't the boxes be connected? That's a much simpler way of
 going
about
 it.

 Best
 Erick


 On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi 
  contacts...@gmail.com
 wrote:

  Hi Walter,
 
  Thanks for the response.
 
  Commit will help to reflect changes on Box1. We are able to
 achieve
this.
  We want the changes to reflect in Box2.
 
  We have two indexes. Say
  Box1: Master  DB has been setup. Data Import runs on this.
  Box2: Slave running.
 
  We want all the updates on Box1 to be merged/present in index on
   Box2.
 Both
  the boxes are not connected over n/w. How can be achieve this.
 
  Please let me know, if am not clear.
 
  Thanks again!
 
  Regards,
  Dikchant
 
  On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
 wun...@wunderwood.org
  wrote:
 
   You do not need to manage online and offline indexes. Commit
 when
   you
 are
   done with your updates and Solr will take care of it for you.
 The
 changes
   are not live until you commit.
  
   wunder
  
   On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
  
Hi,
   
How can 

Re: Update / replication of offline indexes

2012-12-16 Thread Erick Erickson
See the very last line here: http://wiki.apache.org/solr/MergingSolrIndexes

Short answer is that merging will lead to duplicate documents, even with
uniqueKeys defined.

So you're really kind of stuck handling this outside of merge, either by
shipping the
list of overwritten docs and deleting them from the base index or shipping
the JSON/XML
format and indexing those. Of the  two, I'd think the latter is
easiest/least prone to surprises.
Especially since you could re-run the indexing as many times as necessary.

The UniqueKey bits are only guaranteed to overwrite older docs when
indexing, not merging.

Best
Erick


On Thu, Dec 13, 2012 at 3:17 PM, Dikchant Sahi contacts...@gmail.comwrote:

 Hi Alex,

 You got my point right. What I see is merge adds duplicate document. Is
 there a way to overwrite existing document in one core by another. Can
 merge operation lead to data corruption, say in case when the core on
 client had uncommitted changes.

 What would be a better solution for my requirement, merge or indexing
 XML/JSON?

 Regards,
 Dikchant

 On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

  Not sure I fully understood this and maybe you already cover that by
  'merge', but if you know what you gave the client last time, you can just
  build a differential as a second core, then on client mount that second
  core and merge it into the first one (e.g. with DIH).
 
  Just a thought.
 
  Regards,
 Alex.
 
  Personal blog: http://blog.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 
  On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com
  wrote:
 
   Hi Erick,
  
   Sorry for creating the confusion. By slave, I mean the indexes on
 client
   machine will be replica of the master and in not same as the slave in
   master-slave model. Below is the detail:
  
   The system is being developed to support search facility on 1000s of
   system, a majority of which will be offline.
  
   The idea is that we will have a search system which will be sold
   on subscription basis. For each of the subscriber, we will copy the
  master
   index to their local machine, over a drive or CD. Now, if a subscriber
   comes after 2 months and want the updates, we just want to provide the
   deltas for 2 month as the volume of data is huge. For this we can think
  of
   two approaches:
   1. Fetch the documents which are less than 2 months old  in JSON format
   from master Solr. Copy it to the subscriber machine
   and index those documents. (copy through cd / memory sticks)
   2. Create separate indexes for each month on our master machine. Copy
 the
   indexes to the client machine and merge. Prior to merge we need to
 delete
   records which the new index has, to avoid duplicates.
  
   As long as the setup is new, we will copy the complete index and
 restart
   Solr. We are not sure of the best approach for copying the deltas.
  
   Thanks,
   Dikchant
  
  
  
   On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson 
 erickerick...@gmail.com
   wrote:
  
This is somewhat confusing. You say that box2 is the slave, yet
 they're
   not
connected? Then you need to copy the solr home/data index from box
 1
  to
box 2 manually (I'd have box2 solr shut down at the time) and restart
   Solr.
   
Why can't the boxes be connected? That's a much simpler way of going
   about
it.
   
Best
Erick
   
   
On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi 
 contacts...@gmail.com
wrote:
   
 Hi Walter,

 Thanks for the response.

 Commit will help to reflect changes on Box1. We are able to achieve
   this.
 We want the changes to reflect in Box2.

 We have two indexes. Say
 Box1: Master  DB has been setup. Data Import runs on this.
 Box2: Slave running.

 We want all the updates on Box1 to be merged/present in index on
  Box2.
Both
 the boxes are not connected over n/w. How can be achieve this.

 Please let me know, if am not clear.

 Thanks again!

 Regards,
 Dikchant

 On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
wun...@wunderwood.org
 wrote:

  You do not need to manage online and offline indexes. Commit when
  you
are
  done with your updates and Solr will take care of it for you. The
changes
  are not live until you commit.
 
  wunder
 
  On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
 
   Hi,
  
   How can we do delta update of offline indexes?
  
   We have the master index on which data import will be done. The
   index
   directory will be copied to slave machine in case of full
 update,
 through
   CD as the  slave/client machine is offline.
   So, what should be the approach 

Re: Update / replication of offline indexes

2012-12-14 Thread Upayavira
I guess without knowing more about the usecase, it is difficult to see
whether it is best to ship pre-prepared indexes or indexable content.
Certainly the latter would be far simpler, and more in-keeping with the
way Solr is typically used, and personally I'd start with that.

Thinking through what you're saying - clients may update at any time -
i.e. they won't all be forced to accept every update on every occasion -
you will loose much ability to ship partial indexes. As segments get
merged over time, you'd need to ship partial indexes against all of the
possible states that might exist out there, and that would simply be
prohibitive.

Upayavira

On Fri, Dec 14, 2012, at 05:52 AM, Dikchant Sahi wrote:
 Yes, we have an uniqueId defined but merge adds two documents with the
 same
 id. As per my understanding this is how Solr behaves. Correct me if am
 wrong.
 
 On Fri, Dec 14, 2012 at 2:25 AM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:
 
  Do you have IDs defined? How do you expect Sold to know they are duplicate
  records? Maybe the issue is there somewhere.
 
  Regards,
   Alex
  On 13 Dec 2012 15:17, Dikchant Sahi contacts...@gmail.com wrote:
 
   Hi Alex,
  
   You got my point right. What I see is merge adds duplicate document. Is
   there a way to overwrite existing document in one core by another. Can
   merge operation lead to data corruption, say in case when the core on
   client had uncommitted changes.
  
   What would be a better solution for my requirement, merge or indexing
   XML/JSON?
  
   Regards,
   Dikchant
  
   On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
   arafa...@gmail.comwrote:
  
Not sure I fully understood this and maybe you already cover that by
'merge', but if you know what you gave the client last time, you can
  just
build a differential as a second core, then on client mount that second
core and merge it into the first one (e.g. with DIH).
   
Just a thought.
   
Regards,
   Alex.
   
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
   
   
   
On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com
wrote:
   
 Hi Erick,

 Sorry for creating the confusion. By slave, I mean the indexes on
   client
 machine will be replica of the master and in not same as the slave in
 master-slave model. Below is the detail:

 The system is being developed to support search facility on 1000s of
 system, a majority of which will be offline.

 The idea is that we will have a search system which will be sold
 on subscription basis. For each of the subscriber, we will copy the
master
 index to their local machine, over a drive or CD. Now, if a
  subscriber
 comes after 2 months and want the updates, we just want to provide
  the
 deltas for 2 month as the volume of data is huge. For this we can
  think
of
 two approaches:
 1. Fetch the documents which are less than 2 months old  in JSON
  format
 from master Solr. Copy it to the subscriber machine
 and index those documents. (copy through cd / memory sticks)
 2. Create separate indexes for each month on our master machine. Copy
   the
 indexes to the client machine and merge. Prior to merge we need to
   delete
 records which the new index has, to avoid duplicates.

 As long as the setup is new, we will copy the complete index and
   restart
 Solr. We are not sure of the best approach for copying the deltas.

 Thanks,
 Dikchant



 On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson 
   erickerick...@gmail.com
 wrote:

  This is somewhat confusing. You say that box2 is the slave, yet
   they're
 not
  connected? Then you need to copy the solr home/data index from
  box
   1
to
  box 2 manually (I'd have box2 solr shut down at the time) and
  restart
 Solr.
 
  Why can't the boxes be connected? That's a much simpler way of
  going
 about
  it.
 
  Best
  Erick
 
 
  On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi 
   contacts...@gmail.com
  wrote:
 
   Hi Walter,
  
   Thanks for the response.
  
   Commit will help to reflect changes on Box1. We are able to
  achieve
 this.
   We want the changes to reflect in Box2.
  
   We have two indexes. Say
   Box1: Master  DB has been setup. Data Import runs on this.
   Box2: Slave running.
  
   We want all the updates on Box1 to be merged/present in index on
Box2.
  Both
   the boxes are not connected over n/w. How can be achieve this.
  
   Please let me know, if am not clear.
  
   Thanks again!
  
   Regards,
   Dikchant
 

Re: Update / replication of offline indexes

2012-12-13 Thread Alexandre Rafalovitch
Not sure I fully understood this and maybe you already cover that by
'merge', but if you know what you gave the client last time, you can just
build a differential as a second core, then on client mount that second
core and merge it into the first one (e.g. with DIH).

Just a thought.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.comwrote:

 Hi Erick,

 Sorry for creating the confusion. By slave, I mean the indexes on client
 machine will be replica of the master and in not same as the slave in
 master-slave model. Below is the detail:

 The system is being developed to support search facility on 1000s of
 system, a majority of which will be offline.

 The idea is that we will have a search system which will be sold
 on subscription basis. For each of the subscriber, we will copy the master
 index to their local machine, over a drive or CD. Now, if a subscriber
 comes after 2 months and want the updates, we just want to provide the
 deltas for 2 month as the volume of data is huge. For this we can think of
 two approaches:
 1. Fetch the documents which are less than 2 months old  in JSON format
 from master Solr. Copy it to the subscriber machine
 and index those documents. (copy through cd / memory sticks)
 2. Create separate indexes for each month on our master machine. Copy the
 indexes to the client machine and merge. Prior to merge we need to delete
 records which the new index has, to avoid duplicates.

 As long as the setup is new, we will copy the complete index and restart
 Solr. We are not sure of the best approach for copying the deltas.

 Thanks,
 Dikchant



 On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  This is somewhat confusing. You say that box2 is the slave, yet they're
 not
  connected? Then you need to copy the solr home/data index from box 1 to
  box 2 manually (I'd have box2 solr shut down at the time) and restart
 Solr.
 
  Why can't the boxes be connected? That's a much simpler way of going
 about
  it.
 
  Best
  Erick
 
 
  On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com
  wrote:
 
   Hi Walter,
  
   Thanks for the response.
  
   Commit will help to reflect changes on Box1. We are able to achieve
 this.
   We want the changes to reflect in Box2.
  
   We have two indexes. Say
   Box1: Master  DB has been setup. Data Import runs on this.
   Box2: Slave running.
  
   We want all the updates on Box1 to be merged/present in index on Box2.
  Both
   the boxes are not connected over n/w. How can be achieve this.
  
   Please let me know, if am not clear.
  
   Thanks again!
  
   Regards,
   Dikchant
  
   On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
  wun...@wunderwood.org
   wrote:
  
You do not need to manage online and offline indexes. Commit when you
  are
done with your updates and Solr will take care of it for you. The
  changes
are not live until you commit.
   
wunder
   
On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
   
 Hi,

 How can we do delta update of offline indexes?

 We have the master index on which data import will be done. The
 index
 directory will be copied to slave machine in case of full update,
   through
 CD as the  slave/client machine is offline.
 So, what should be the approach for getting the delta to the
 slave. I
   can
 think of two approaches.

 1.Create separate indexes of the delta on the master machine, copy
 it
   to
 the slave machine and merge. Before merging the indexes on the
 client
 machine, delete all the updated and deleted documents in client
  machine
 else merge will add duplicates. So along with the index, we need to
 transfer the list of documents which has been updated/deleted.

 2. Extract all the documents which has changed since a particular
  time
   in
 XML/JSON and index it in client machine.

 The size of indexes are huge, so we cannot rollover index
 everytime.

 Please help me with your take and challenges you see in the above
 approaches. Please suggest if you think of any other better
 approach.

 Thanks a ton!

 Regards,
 Dikchant
   
--
Walter Underwood
wun...@wunderwood.org
   
   
   
   
  
 



Re: Update / replication of offline indexes

2012-12-13 Thread Dikchant Sahi
Hi Alex,

You got my point right. What I see is merge adds duplicate document. Is
there a way to overwrite existing document in one core by another. Can
merge operation lead to data corruption, say in case when the core on
client had uncommitted changes.

What would be a better solution for my requirement, merge or indexing
XML/JSON?

Regards,
Dikchant

On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Not sure I fully understood this and maybe you already cover that by
 'merge', but if you know what you gave the client last time, you can just
 build a differential as a second core, then on client mount that second
 core and merge it into the first one (e.g. with DIH).

 Just a thought.

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



 On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com
 wrote:

  Hi Erick,
 
  Sorry for creating the confusion. By slave, I mean the indexes on client
  machine will be replica of the master and in not same as the slave in
  master-slave model. Below is the detail:
 
  The system is being developed to support search facility on 1000s of
  system, a majority of which will be offline.
 
  The idea is that we will have a search system which will be sold
  on subscription basis. For each of the subscriber, we will copy the
 master
  index to their local machine, over a drive or CD. Now, if a subscriber
  comes after 2 months and want the updates, we just want to provide the
  deltas for 2 month as the volume of data is huge. For this we can think
 of
  two approaches:
  1. Fetch the documents which are less than 2 months old  in JSON format
  from master Solr. Copy it to the subscriber machine
  and index those documents. (copy through cd / memory sticks)
  2. Create separate indexes for each month on our master machine. Copy the
  indexes to the client machine and merge. Prior to merge we need to delete
  records which the new index has, to avoid duplicates.
 
  As long as the setup is new, we will copy the complete index and restart
  Solr. We are not sure of the best approach for copying the deltas.
 
  Thanks,
  Dikchant
 
 
 
  On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   This is somewhat confusing. You say that box2 is the slave, yet they're
  not
   connected? Then you need to copy the solr home/data index from box 1
 to
   box 2 manually (I'd have box2 solr shut down at the time) and restart
  Solr.
  
   Why can't the boxes be connected? That's a much simpler way of going
  about
   it.
  
   Best
   Erick
  
  
   On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com
   wrote:
  
Hi Walter,
   
Thanks for the response.
   
Commit will help to reflect changes on Box1. We are able to achieve
  this.
We want the changes to reflect in Box2.
   
We have two indexes. Say
Box1: Master  DB has been setup. Data Import runs on this.
Box2: Slave running.
   
We want all the updates on Box1 to be merged/present in index on
 Box2.
   Both
the boxes are not connected over n/w. How can be achieve this.
   
Please let me know, if am not clear.
   
Thanks again!
   
Regards,
Dikchant
   
On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
   wun...@wunderwood.org
wrote:
   
 You do not need to manage online and offline indexes. Commit when
 you
   are
 done with your updates and Solr will take care of it for you. The
   changes
 are not live until you commit.

 wunder

 On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:

  Hi,
 
  How can we do delta update of offline indexes?
 
  We have the master index on which data import will be done. The
  index
  directory will be copied to slave machine in case of full update,
through
  CD as the  slave/client machine is offline.
  So, what should be the approach for getting the delta to the
  slave. I
can
  think of two approaches.
 
  1.Create separate indexes of the delta on the master machine,
 copy
  it
to
  the slave machine and merge. Before merging the indexes on the
  client
  machine, delete all the updated and deleted documents in client
   machine
  else merge will add duplicates. So along with the index, we need
 to
  transfer the list of documents which has been updated/deleted.
 
  2. Extract all the documents which has changed since a particular
   time
in
  XML/JSON and index it in client machine.
 
  The size of indexes are huge, so we cannot rollover index
  everytime.
 
  Please help me with your take and challenges you see in the above
  approaches. Please suggest if you think of any other better
  

Re: Update / replication of offline indexes

2012-12-13 Thread Alexandre Rafalovitch
Do you have IDs defined? How do you expect Sold to know they are duplicate
records? Maybe the issue is there somewhere.

Regards,
 Alex
On 13 Dec 2012 15:17, Dikchant Sahi contacts...@gmail.com wrote:

 Hi Alex,

 You got my point right. What I see is merge adds duplicate document. Is
 there a way to overwrite existing document in one core by another. Can
 merge operation lead to data corruption, say in case when the core on
 client had uncommitted changes.

 What would be a better solution for my requirement, merge or indexing
 XML/JSON?

 Regards,
 Dikchant

 On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

  Not sure I fully understood this and maybe you already cover that by
  'merge', but if you know what you gave the client last time, you can just
  build a differential as a second core, then on client mount that second
  core and merge it into the first one (e.g. with DIH).
 
  Just a thought.
 
  Regards,
 Alex.
 
  Personal blog: http://blog.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 
  On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com
  wrote:
 
   Hi Erick,
  
   Sorry for creating the confusion. By slave, I mean the indexes on
 client
   machine will be replica of the master and in not same as the slave in
   master-slave model. Below is the detail:
  
   The system is being developed to support search facility on 1000s of
   system, a majority of which will be offline.
  
   The idea is that we will have a search system which will be sold
   on subscription basis. For each of the subscriber, we will copy the
  master
   index to their local machine, over a drive or CD. Now, if a subscriber
   comes after 2 months and want the updates, we just want to provide the
   deltas for 2 month as the volume of data is huge. For this we can think
  of
   two approaches:
   1. Fetch the documents which are less than 2 months old  in JSON format
   from master Solr. Copy it to the subscriber machine
   and index those documents. (copy through cd / memory sticks)
   2. Create separate indexes for each month on our master machine. Copy
 the
   indexes to the client machine and merge. Prior to merge we need to
 delete
   records which the new index has, to avoid duplicates.
  
   As long as the setup is new, we will copy the complete index and
 restart
   Solr. We are not sure of the best approach for copying the deltas.
  
   Thanks,
   Dikchant
  
  
  
   On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson 
 erickerick...@gmail.com
   wrote:
  
This is somewhat confusing. You say that box2 is the slave, yet
 they're
   not
connected? Then you need to copy the solr home/data index from box
 1
  to
box 2 manually (I'd have box2 solr shut down at the time) and restart
   Solr.
   
Why can't the boxes be connected? That's a much simpler way of going
   about
it.
   
Best
Erick
   
   
On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi 
 contacts...@gmail.com
wrote:
   
 Hi Walter,

 Thanks for the response.

 Commit will help to reflect changes on Box1. We are able to achieve
   this.
 We want the changes to reflect in Box2.

 We have two indexes. Say
 Box1: Master  DB has been setup. Data Import runs on this.
 Box2: Slave running.

 We want all the updates on Box1 to be merged/present in index on
  Box2.
Both
 the boxes are not connected over n/w. How can be achieve this.

 Please let me know, if am not clear.

 Thanks again!

 Regards,
 Dikchant

 On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
wun...@wunderwood.org
 wrote:

  You do not need to manage online and offline indexes. Commit when
  you
are
  done with your updates and Solr will take care of it for you. The
changes
  are not live until you commit.
 
  wunder
 
  On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
 
   Hi,
  
   How can we do delta update of offline indexes?
  
   We have the master index on which data import will be done. The
   index
   directory will be copied to slave machine in case of full
 update,
 through
   CD as the  slave/client machine is offline.
   So, what should be the approach for getting the delta to the
   slave. I
 can
   think of two approaches.
  
   1.Create separate indexes of the delta on the master machine,
  copy
   it
 to
   the slave machine and merge. Before merging the indexes on the
   client
   machine, delete all the updated and deleted documents in client
machine
   else merge will add duplicates. So along with the index, we
 need
  to
   transfer the list of documents which has been updated/deleted.
 

Re: Update / replication of offline indexes

2012-12-13 Thread Dikchant Sahi
Yes, we have an uniqueId defined but merge adds two documents with the same
id. As per my understanding this is how Solr behaves. Correct me if am
wrong.

On Fri, Dec 14, 2012 at 2:25 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Do you have IDs defined? How do you expect Sold to know they are duplicate
 records? Maybe the issue is there somewhere.

 Regards,
  Alex
 On 13 Dec 2012 15:17, Dikchant Sahi contacts...@gmail.com wrote:

  Hi Alex,
 
  You got my point right. What I see is merge adds duplicate document. Is
  there a way to overwrite existing document in one core by another. Can
  merge operation lead to data corruption, say in case when the core on
  client had uncommitted changes.
 
  What would be a better solution for my requirement, merge or indexing
  XML/JSON?
 
  Regards,
  Dikchant
 
  On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   Not sure I fully understood this and maybe you already cover that by
   'merge', but if you know what you gave the client last time, you can
 just
   build a differential as a second core, then on client mount that second
   core and merge it into the first one (e.g. with DIH).
  
   Just a thought.
  
   Regards,
  Alex.
  
   Personal blog: http://blog.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
  
  
  
   On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com
   wrote:
  
Hi Erick,
   
Sorry for creating the confusion. By slave, I mean the indexes on
  client
machine will be replica of the master and in not same as the slave in
master-slave model. Below is the detail:
   
The system is being developed to support search facility on 1000s of
system, a majority of which will be offline.
   
The idea is that we will have a search system which will be sold
on subscription basis. For each of the subscriber, we will copy the
   master
index to their local machine, over a drive or CD. Now, if a
 subscriber
comes after 2 months and want the updates, we just want to provide
 the
deltas for 2 month as the volume of data is huge. For this we can
 think
   of
two approaches:
1. Fetch the documents which are less than 2 months old  in JSON
 format
from master Solr. Copy it to the subscriber machine
and index those documents. (copy through cd / memory sticks)
2. Create separate indexes for each month on our master machine. Copy
  the
indexes to the client machine and merge. Prior to merge we need to
  delete
records which the new index has, to avoid duplicates.
   
As long as the setup is new, we will copy the complete index and
  restart
Solr. We are not sure of the best approach for copying the deltas.
   
Thanks,
Dikchant
   
   
   
On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson 
  erickerick...@gmail.com
wrote:
   
 This is somewhat confusing. You say that box2 is the slave, yet
  they're
not
 connected? Then you need to copy the solr home/data index from
 box
  1
   to
 box 2 manually (I'd have box2 solr shut down at the time) and
 restart
Solr.

 Why can't the boxes be connected? That's a much simpler way of
 going
about
 it.

 Best
 Erick


 On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi 
  contacts...@gmail.com
 wrote:

  Hi Walter,
 
  Thanks for the response.
 
  Commit will help to reflect changes on Box1. We are able to
 achieve
this.
  We want the changes to reflect in Box2.
 
  We have two indexes. Say
  Box1: Master  DB has been setup. Data Import runs on this.
  Box2: Slave running.
 
  We want all the updates on Box1 to be merged/present in index on
   Box2.
 Both
  the boxes are not connected over n/w. How can be achieve this.
 
  Please let me know, if am not clear.
 
  Thanks again!
 
  Regards,
  Dikchant
 
  On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
 wun...@wunderwood.org
  wrote:
 
   You do not need to manage online and offline indexes. Commit
 when
   you
 are
   done with your updates and Solr will take care of it for you.
 The
 changes
   are not live until you commit.
  
   wunder
  
   On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
  
Hi,
   
How can we do delta update of offline indexes?
   
We have the master index on which data import will be done.
 The
index
directory will be copied to slave machine in case of full
  update,
  through
CD as the  slave/client machine is offline.
So, what should be the approach for getting the delta to the
slave. I
  can
think of two approaches.
   
 

Re: Update / replication of offline indexes

2012-12-12 Thread Erick Erickson
This is somewhat confusing. You say that box2 is the slave, yet they're not
connected? Then you need to copy the solr home/data index from box 1 to
box 2 manually (I'd have box2 solr shut down at the time) and restart Solr.

Why can't the boxes be connected? That's a much simpler way of going about
it.

Best
Erick


On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.comwrote:

 Hi Walter,

 Thanks for the response.

 Commit will help to reflect changes on Box1. We are able to achieve this.
 We want the changes to reflect in Box2.

 We have two indexes. Say
 Box1: Master  DB has been setup. Data Import runs on this.
 Box2: Slave running.

 We want all the updates on Box1 to be merged/present in index on Box2. Both
 the boxes are not connected over n/w. How can be achieve this.

 Please let me know, if am not clear.

 Thanks again!

 Regards,
 Dikchant

 On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org
 wrote:

  You do not need to manage online and offline indexes. Commit when you are
  done with your updates and Solr will take care of it for you. The changes
  are not live until you commit.
 
  wunder
 
  On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
 
   Hi,
  
   How can we do delta update of offline indexes?
  
   We have the master index on which data import will be done. The index
   directory will be copied to slave machine in case of full update,
 through
   CD as the  slave/client machine is offline.
   So, what should be the approach for getting the delta to the slave. I
 can
   think of two approaches.
  
   1.Create separate indexes of the delta on the master machine, copy it
 to
   the slave machine and merge. Before merging the indexes on the client
   machine, delete all the updated and deleted documents in client machine
   else merge will add duplicates. So along with the index, we need to
   transfer the list of documents which has been updated/deleted.
  
   2. Extract all the documents which has changed since a particular time
 in
   XML/JSON and index it in client machine.
  
   The size of indexes are huge, so we cannot rollover index everytime.
  
   Please help me with your take and challenges you see in the above
   approaches. Please suggest if you think of any other better approach.
  
   Thanks a ton!
  
   Regards,
   Dikchant
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 



Re: Update / replication of offline indexes

2012-12-12 Thread Dikchant Sahi
Hi Erick,

Sorry for creating the confusion. By slave, I mean the indexes on client
machine will be replica of the master and in not same as the slave in
master-slave model. Below is the detail:

The system is being developed to support search facility on 1000s of
system, a majority of which will be offline.

The idea is that we will have a search system which will be sold
on subscription basis. For each of the subscriber, we will copy the master
index to their local machine, over a drive or CD. Now, if a subscriber
comes after 2 months and want the updates, we just want to provide the
deltas for 2 month as the volume of data is huge. For this we can think of
two approaches:
1. Fetch the documents which are less than 2 months old  in JSON format
from master Solr. Copy it to the subscriber machine
and index those documents. (copy through cd / memory sticks)
2. Create separate indexes for each month on our master machine. Copy the
indexes to the client machine and merge. Prior to merge we need to delete
records which the new index has, to avoid duplicates.

As long as the setup is new, we will copy the complete index and restart
Solr. We are not sure of the best approach for copying the deltas.

Thanks,
Dikchant



On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.comwrote:

 This is somewhat confusing. You say that box2 is the slave, yet they're not
 connected? Then you need to copy the solr home/data index from box 1 to
 box 2 manually (I'd have box2 solr shut down at the time) and restart Solr.

 Why can't the boxes be connected? That's a much simpler way of going about
 it.

 Best
 Erick


 On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com
 wrote:

  Hi Walter,
 
  Thanks for the response.
 
  Commit will help to reflect changes on Box1. We are able to achieve this.
  We want the changes to reflect in Box2.
 
  We have two indexes. Say
  Box1: Master  DB has been setup. Data Import runs on this.
  Box2: Slave running.
 
  We want all the updates on Box1 to be merged/present in index on Box2.
 Both
  the boxes are not connected over n/w. How can be achieve this.
 
  Please let me know, if am not clear.
 
  Thanks again!
 
  Regards,
  Dikchant
 
  On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood 
 wun...@wunderwood.org
  wrote:
 
   You do not need to manage online and offline indexes. Commit when you
 are
   done with your updates and Solr will take care of it for you. The
 changes
   are not live until you commit.
  
   wunder
  
   On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:
  
Hi,
   
How can we do delta update of offline indexes?
   
We have the master index on which data import will be done. The index
directory will be copied to slave machine in case of full update,
  through
CD as the  slave/client machine is offline.
So, what should be the approach for getting the delta to the slave. I
  can
think of two approaches.
   
1.Create separate indexes of the delta on the master machine, copy it
  to
the slave machine and merge. Before merging the indexes on the client
machine, delete all the updated and deleted documents in client
 machine
else merge will add duplicates. So along with the index, we need to
transfer the list of documents which has been updated/deleted.
   
2. Extract all the documents which has changed since a particular
 time
  in
XML/JSON and index it in client machine.
   
The size of indexes are huge, so we cannot rollover index everytime.
   
Please help me with your take and challenges you see in the above
approaches. Please suggest if you think of any other better approach.
   
Thanks a ton!
   
Regards,
Dikchant
  
   --
   Walter Underwood
   wun...@wunderwood.org
  
  
  
  
 



Re: Update / replication of offline indexes

2012-12-10 Thread Walter Underwood
You do not need to manage online and offline indexes. Commit when you are done 
with your updates and Solr will take care of it for you. The changes are not 
live until you commit.

wunder

On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:

 Hi,
 
 How can we do delta update of offline indexes?
 
 We have the master index on which data import will be done. The index
 directory will be copied to slave machine in case of full update, through
 CD as the  slave/client machine is offline.
 So, what should be the approach for getting the delta to the slave. I can
 think of two approaches.
 
 1.Create separate indexes of the delta on the master machine, copy it to
 the slave machine and merge. Before merging the indexes on the client
 machine, delete all the updated and deleted documents in client machine
 else merge will add duplicates. So along with the index, we need to
 transfer the list of documents which has been updated/deleted.
 
 2. Extract all the documents which has changed since a particular time in
 XML/JSON and index it in client machine.
 
 The size of indexes are huge, so we cannot rollover index everytime.
 
 Please help me with your take and challenges you see in the above
 approaches. Please suggest if you think of any other better approach.
 
 Thanks a ton!
 
 Regards,
 Dikchant

--
Walter Underwood
wun...@wunderwood.org





Re: Update / replication of offline indexes

2012-12-10 Thread Dikchant Sahi
Hi Walter,

Thanks for the response.

Commit will help to reflect changes on Box1. We are able to achieve this.
We want the changes to reflect in Box2.

We have two indexes. Say
Box1: Master  DB has been setup. Data Import runs on this.
Box2: Slave running.

We want all the updates on Box1 to be merged/present in index on Box2. Both
the boxes are not connected over n/w. How can be achieve this.

Please let me know, if am not clear.

Thanks again!

Regards,
Dikchant

On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.orgwrote:

 You do not need to manage online and offline indexes. Commit when you are
 done with your updates and Solr will take care of it for you. The changes
 are not live until you commit.

 wunder

 On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote:

  Hi,
 
  How can we do delta update of offline indexes?
 
  We have the master index on which data import will be done. The index
  directory will be copied to slave machine in case of full update, through
  CD as the  slave/client machine is offline.
  So, what should be the approach for getting the delta to the slave. I can
  think of two approaches.
 
  1.Create separate indexes of the delta on the master machine, copy it to
  the slave machine and merge. Before merging the indexes on the client
  machine, delete all the updated and deleted documents in client machine
  else merge will add duplicates. So along with the index, we need to
  transfer the list of documents which has been updated/deleted.
 
  2. Extract all the documents which has changed since a particular time in
  XML/JSON and index it in client machine.
 
  The size of indexes are huge, so we cannot rollover index everytime.
 
  Please help me with your take and challenges you see in the above
  approaches. Please suggest if you think of any other better approach.
 
  Thanks a ton!
 
  Regards,
  Dikchant

 --
 Walter Underwood
 wun...@wunderwood.org