Solrj/BinaryResponseWriter should be used for calls to get metadata on
the index. The actual index transfer must be done over simple http. I
may propose a Simple BinaryRawResponseWriter for that.

Sending a huge file in a single response is definitely a bad idea. It
should be send in chunks of say 10MB or so (configurable)
. It must have also some mechanism to generate checksums for the whole
and if possible for chunks.
A solution can look like this
* getFileList . Get the names of index files and their checksums.
(NamedList response)
* getFilePart: for 1...n of configured chunk size (simple binary output/http)
* join parts 1..n  and compare checksums
* If it passes keep the file delete the parts
* If it fails get checksums for individual chunks (NamedList response)
* and re-fetch the corrupted chunks (simple binary output/http)

Once all the files are downloaded and the checksums are matched ,
trigger a snapinstall

The details of the snapinstall in windows (with or without hardlinks
is still a bit fuzzy). But in worst case scenario a copy should be ok.
(better than having no replication at all)

The solution may not be very optimal for non optimized index
replication. But in other cases we may be able to achieve comparable
performance.


--Noble


On Tue, Apr 29, 2008 at 7:18 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> I agree with the goals - having a replication module that was more
>  integrated with Solr and worked in Windows would be nice.
>
>  The details are still a bit fuzzy though... I'm not sure if SolrJ &
>  BinaryResponseWriter should be used as the overhead when transferring
>  gigabytes of files would probably be significant.  One would probably
>  want to transfer the file in chunks also...  a single gigabyte HTTP
>  request is probably not the best idea.
>
>  -Yonik
>
>
>
>  On Tue, Apr 29, 2008 at 5:01 AM, Noble Paul നോബിള്‍ नोब्ळ्
>  <[EMAIL PROTECTED]> wrote:
>  > hi ,
>  >  The current replication strategy in solr involves shell scripts . The
>  >  following are the drawbacks
>  >  *  It does not work with windows
>  >  * Replication works as a separate piece not integrated with solr.
>  >  * Cannot control replication from solr admin/JMX
>  >  * Each operation requires manual telnet to the host
>  >
>  >  Doing the replication within java code has the following advantages
>  >  * Platform independence
>  >  * Manual steps can be completely eliminated. Everything can be driven
>  >  from solrconfig.xml .
>  >  ** Just put in the url of the master in the slaves that should be good
>  >  enough to enable replication. Other things like frequency of
>  >  snapshoot/snappull can also be configured
>  >  * Start/stop can be triggered from solr/admin or JMX
>  >  * Can get the status/progress while replication is going on
>  >  * No need to have a login into the machine
>  >
>  >  The implementation can be done as two components
>  >  * A SolrEventListener which does a snapshoot . Same as done by the script
>  >  * A ReplicationHandler which can act as a server to dish out the index
>  >  snapshots (in the master)
>  >  ** In the slave the same handler can poll at regular intervals and if
>  >  there is a new snapshot fetch the index over http (it can use
>  >  solrj+BinaryReponseWriter)
>  >  * The same Handler can do a snap install
>  >  * The Handler may expose all the operations over a REST interface or JMX
>  >  * It may also show the current state of the master index through the 
> console
>  >
>  >  What do you think?
>  >
>  >  --
>  >  --Noble Paul
>  >
>

Reply via email to