I am looking for advice on how to vertically partition an index (break each documents fields across > 1 core/instance).
Some background: - Our system stores all document metadata in database tables - The contents of each document is stored on a filesystem - Metadata changes frequently, and index must be updated to match (eg. minutes delay, not hours) - Contents changes infrequently, and is a high cost to reindex (large files, complex analyzers) Having the contents stored in the same index as the metadata means that it will be frequently & needlessly reanalyzed. This causes a lot of wasted cycles as there may be a large number of documents that have a single field changed, but the system ends up re-analyzing the gigabytes of text contents for these documents. One suggested solution was to store the contents field, and copy the field (rather than re-analyze) each time a document is reindexed. However, this would cause a lot of wasted storage, as we have terrabytes of documents. We are currently looking at a vertical partioning scheme, that uses multiple solr cores. One core contains the schema for all the metadata, the other core has the schema for the contents. We have successfully made a custom request handler that pushes documents to both cores, effectively producing the split indexes. The problem now, is how to split the queries across both cores? Given that there could be AND/OR/NOT clauses, containing both metadata & contents fields, we'll need to find some way to divide a query into to different parts that can be run on each core, and have the hits joined back together afterwards. This is similar to the sharding feature, but requires intersection as well as union of result hits. Does anyone have any advice on how to go about dividing up the different query clauses, and how we could merge results? Or can anyone suggest a different approach to vertical partitioning? thanks -Mark -- View this message in context: http://www.nabble.com/Vertical-Partitioning-advice-tp21906668p21906668.html Sent from the Solr - User mailing list archive at Nabble.com.