On Tue, Apr 29, 2008 at 2:02 PM, Noble Paul നോബിള് नोब्ळ् <[EMAIL PROTECTED]> wrote: > Solrj/BinaryResponseWriter should be used for calls to get metadata on > the index. The actual index transfer must be done over simple http. I > may propose a Simple BinaryRawResponseWriter for that. > > Sending a huge file in a single response is definitely a bad idea. It > should be send in chunks of say 10MB or so (configurable) > . It must have also some mechanism to generate checksums for the whole > and if possible for chunks.
checksumming is done by TCP (and by disk drives), so it's not strictly necessary to maintain integrity. Might be a nice option for debugging though. > A solution can look like this > * getFileList . Get the names of index files and their checksums. > (NamedList response) > * getFilePart: for 1...n of configured chunk size (simple binary output/http) > * join parts 1..n and compare checksums > * If it passes keep the file delete the parts > * If it fails get checksums for individual chunks (NamedList response) > * and re-fetch the corrupted chunks (simple binary output/http) > > Once all the files are downloaded and the checksums are matched , > trigger a snapinstall > > The details of the snapinstall in windows (with or without hardlinks > is still a bit fuzzy). But in worst case scenario a copy should be ok. > (better than having no replication at all) Now that Lucene has lockless commits and changes almost no files, there are perhaps other options that would be better for windows. For the lucene index, we might be able to avoid hard links altogether and only copy new files. We could keep old segments from being removed while in use with custom delete policies. See SnapshotDeletePolicy for example: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/index/SnapshotDeletionPolicy.html -Yonik
