Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by YonikSeeley: http://wiki.apache.org/solr/CollectionBuilding The comment on the change is: corrections ------------------------------------------------------------------------------ - = Collection Building = + = Collection Rebuilding = - Collection Building is creating a new index from scratch, generally a "manual" effort. A full rebuild in which a new collection replaces the old collection would be required in cases such as the following: + Collection rebuilding is creating an index from scratch (not incremental updates). A full rebuild in which a new collection replaces the old collection would be required in cases such as the following: * When building a new collection with no previous collection data existing. * When launching something new. - * When a collection has become corrupted to a greater or lesser extent. + * When a collection has become corrupted for some reason. - * When redefining an existing field type—changing your schema in a way that requires a rebuild. For example, merely adding fields to the schema does not require a rebuild, but changing some field types from a simple integer to some exotic type of integer does. + * When redefining an existing field type—changing your schema in a way that requires a rebuild. For example, merely adding fields to the schema does not require a rebuild, but changing the type of a field would. - == Recommended Procedure for New Index Building == + == A Procedure for New Index Building == Perform the procedure below from the master server to do collection rebuilds in a production environment. 1. Turn off distribution by running '''rsyncd-stop'''. This prevents the slaves from getting data from the master. [[BR]] '''Note:''' Ensure that a distribution is not running when you run rsyncd-stop. 1. Run the script, '''abc''' (Atomic Backup post-Commit), to create a snapshot for a safe backup. + 1. If you have a separate process that does incremental updating that might come in while you are performing this procedure, you may want to disable it. - 1. Delete/clean-out the active directory, '''./index''', on the master server. + 1. Remove the index directory, '''./solr/data/index/''', on the master server. - 1. Disable incremental updating that might come in while you are performing this procedure. Use an event daemon, or the crontab, for example. 1. If you have changes to the schema or any new configurations to be installed, stop the server. Make the changes to the schema/configurations and install them. 1. Restart the server. - 1. Run the index builder. Build time is variable depending upon amount and type of data that you have. You many want to monitor the build if it is a long or complex one. - 1. Run the script, '''abo''' (Atomic Backup post-Optimize), to optimize the collection. [[BR]] '''Note:''' if you know that a large number of incremental updates are still in process from Step 4, wait until they are done before running abo. - 1. Run the '''rsyncd-start''' script to re-enable collection distribution requests from the slaves. The new collection data will be pulled by the slaves while still serving requests. + 1. Re-index all of your documents. + 1. Run the script, '''optimize''', to optimize the collection. + 1. Re-enable index distribution with the '''rsyncd-start''' script. The new collection data will be pulled by the slaves while still serving requests. + + Note: If you have configured Solr to take snapshots only for optimized indicies, and have an index builder that only issues optimize commands when the index is completely rebuilt, you can skip steps dealing with disabling distribution. == Alternative Approaches for New Index Building == - * Create an "offline" solar port, index from scratch on the offline port, disable snapshot pulling, shut down the master, copy the index from the offline port to the master, enable snapshot pulling. + * Create an "offline" solr port, index from scratch on the offline port, disable snapshot pulling, shut down the master, copy the index from the offline port to the master, enable snapshot pulling. - * Create an "offline" solar port, index from scratch on the offline port, disable snapshot pulling, shut down the master, copy the index from the offline port to the master, disable slave boxes one-at-a-time and copy the index to each manual, enable snapshot pulling. (This last one in particular reqires a lot more setup time and thought.) + * Create an "offline" solr port, index from scratch on the offline port, disable snapshot pulling, shut down the master, copy the index from the offline port to the master, disable slave boxes one-at-a-time and copy the index to each manual, enable snapshot pulling. (This last one in particular reqires a lot more setup time and thought.)
