Each day the index grows by ~250 MB; however I am anticipating that this growth will slow down because there will be repetitions (just a guess). Its not the order of growth but limitation of our infrastructure. Basically a budgetary constraint :-)
Apparently there seems to be no problem than disk space. So we will go ahead with the idea of stored fields. On Thu, Jun 6, 2013 at 5:03 PM, Erick Erickson <erickerick...@gmail.com>wrote: > By and large, stored fields are pretty irrelevant for resource > consumption _except_ for > disk space consumed. Sharded systems work fine, the > stored data is stored in the index files (*.fdt and *.fdx) files in > each segment on each shard. > > But you haven't told us anything about your data. How much are > you talking about here? 100s of G? Terabytes? Other than disk > space, You may well be anticipating problems that don't exist... > > Now, when _returning_ documents the fields must be read, so > there is some resource consumption there which you can > mitigate with lazy field loading. But this is usually just a few docs > so often isn't a problem. > > Best > Erick > > On Thu, Jun 6, 2013 at 3:34 AM, Sourajit Basak <sourajit.ba...@gmail.com> > wrote: > > Absolutely. Solr will return the reference along the docs/results; those > > references may be used to look-up the actual stuff. Such use cases aren't > > hard to solve. > > > > If the use case demands returning the actual stuff alongside the results, > > it becomes non-trivial, especially during high loads. > > > > To avoid this and do a quick implementation I can judiciously create > stored > > fields and see how it performs. I will need to figure out what happens if > > the volume growth of stored fields is high, how much is the disk I/O and > > what happens if we shard the index, like, what happens to the stored > fields > > then. > > > > Best, > > Sourajit > > > > > > > > > > On Tue, Jun 4, 2013 at 5:31 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > >> You have to index something with your Solr documents that > >> has meaning in _your_ system so you can find the > >> original record. You don't search this field, you just > >> return it with the search results and then use it to get > >> the original document. > >> > >> If you're storing the original in a DB, this can be the PK. > >> If on a file system the path. etc. > >> > >> Essentially, since the association is specific to your environment > >> you need to handle it explicitly... > >> > >> Best > >> Erick > >> > >> On Mon, Jun 3, 2013 at 11:56 AM, Sourajit Basak > >> <sourajit.ba...@gmail.com> wrote: > >> > Consider the following use case. > >> > > >> > Certain words are extracted from a document and indexed. The exact > >> sentence > >> > containing the word cannot be stored alongside the extracted word > because > >> > of the volume at which the documents grow; How can the index and, lets > >> call > >> > it doc servers be separated ? > >> > > >> > An option is to store the sentences in MongoDB or a RDBMS. But there > >> seems > >> > to be a schema level design issue. Assuming 'word' to be a multivalued > >> > field, how do we associate to it a reference to the corresponding > entry > >> in > >> > the doc server. > >> > > >> > May create (word_1, ref_1) tuples. Is there any other in-built > feature ? > >> > > >> > Any related project which separates index & doc servers ? > >> > > >> > Thanks, > >> > Sourajit > >> >