We can probably do away with hard-links if a core swap (rename) can be made to work without downtime.
On Tue, Apr 29, 2008 at 11:32 PM, Noble Paul നോബിള് नोब्ळ् < [EMAIL PROTECTED]> wrote: > Solrj/BinaryResponseWriter should be used for calls to get metadata on > the index. The actual index transfer must be done over simple http. I > may propose a Simple BinaryRawResponseWriter for that. > > Sending a huge file in a single response is definitely a bad idea. It > should be send in chunks of say 10MB or so (configurable) > . It must have also some mechanism to generate checksums for the whole > and if possible for chunks. > A solution can look like this > * getFileList . Get the names of index files and their checksums. > (NamedList response) > * getFilePart: for 1...n of configured chunk size (simple binary > output/http) > * join parts 1..n and compare checksums > * If it passes keep the file delete the parts > * If it fails get checksums for individual chunks (NamedList response) > * and re-fetch the corrupted chunks (simple binary output/http) > > Once all the files are downloaded and the checksums are matched , > trigger a snapinstall > > The details of the snapinstall in windows (with or without hardlinks > is still a bit fuzzy). But in worst case scenario a copy should be ok. > (better than having no replication at all) > > The solution may not be very optimal for non optimized index > replication. But in other cases we may be able to achieve comparable > performance. > > > --Noble > > > On Tue, Apr 29, 2008 at 7:18 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > I agree with the goals - having a replication module that was more > > integrated with Solr and worked in Windows would be nice. > > > > The details are still a bit fuzzy though... I'm not sure if SolrJ & > > BinaryResponseWriter should be used as the overhead when transferring > > gigabytes of files would probably be significant. One would probably > > want to transfer the file in chunks also... a single gigabyte HTTP > > request is probably not the best idea. > > > > -Yonik > > > > > > > > On Tue, Apr 29, 2008 at 5:01 AM, Noble Paul നോബിള് नोब्ळ् > > <[EMAIL PROTECTED]> wrote: > > > hi , > > > The current replication strategy in solr involves shell scripts . > The > > > following are the drawbacks > > > * It does not work with windows > > > * Replication works as a separate piece not integrated with solr. > > > * Cannot control replication from solr admin/JMX > > > * Each operation requires manual telnet to the host > > > > > > Doing the replication within java code has the following advantages > > > * Platform independence > > > * Manual steps can be completely eliminated. Everything can be > driven > > > from solrconfig.xml . > > > ** Just put in the url of the master in the slaves that should be > good > > > enough to enable replication. Other things like frequency of > > > snapshoot/snappull can also be configured > > > * Start/stop can be triggered from solr/admin or JMX > > > * Can get the status/progress while replication is going on > > > * No need to have a login into the machine > > > > > > The implementation can be done as two components > > > * A SolrEventListener which does a snapshoot . Same as done by the > script > > > * A ReplicationHandler which can act as a server to dish out the > index > > > snapshots (in the master) > > > ** In the slave the same handler can poll at regular intervals and > if > > > there is a new snapshot fetch the index over http (it can use > > > solrj+BinaryReponseWriter) > > > * The same Handler can do a snap install > > > * The Handler may expose all the operations over a REST interface or > JMX > > > * It may also show the current state of the master index through the > console > > > > > > What do you think? > > > > > > -- > > > --Noble Paul > > > > > > -- Regards, Shalin Shekhar Mangar.
