Re: Solr: separating index and storage

2013-06-06 Thread Sourajit Basak
Absolutely. Solr will return the reference along the docs/results; those references may be used to look-up the actual stuff. Such use cases aren't hard to solve. If the use case demands returning the actual stuff alongside the results, it becomes non-trivial, especially during high loads. To

Re: Solr: separating index and storage

2013-06-06 Thread Erick Erickson
By and large, stored fields are pretty irrelevant for resource consumption _except_ for disk space consumed. Sharded systems work fine, the stored data is stored in the index files (*.fdt and *.fdx) files in each segment on each shard. But you haven't told us anything about your data. How much

Re: Solr: separating index and storage

2013-06-06 Thread Sourajit Basak
Each day the index grows by ~250 MB; however I am anticipating that this growth will slow down because there will be repetitions (just a guess). Its not the order of growth but limitation of our infrastructure. Basically a budgetary constraint :-) Apparently there seems to be no problem than disk

Re: Solr: separating index and storage

2013-06-06 Thread Erick Erickson
bq: I am anticipating that this growth will slow down because there will be repetitions This will be true for your indexed data, but NOT for your stored data. Each stored field is stored as-is per document. It'll be compressed, so won't take up the entire 250M, but it'll still be stored. FWIW,

Re: Solr: separating index and storage

2013-06-04 Thread Erick Erickson
You have to index something with your Solr documents that has meaning in _your_ system so you can find the original record. You don't search this field, you just return it with the search results and then use it to get the original document. If you're storing the original in a DB, this can be the

Solr: separating index and storage

2013-06-03 Thread Sourajit Basak
Consider the following use case. Certain words are extracted from a document and indexed. The exact sentence containing the word cannot be stored alongside the extracted word because of the volume at which the documents grow; How can the index and, lets call it doc servers be separated ? An