Hi and thanks, Emir! FieldType might indeed be another layer where the logic could live.
On Wed, Feb 21, 2018 at 6:32 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi, > Maybe you could use external field type as an example how to hook up > values from DB: https://lucene.apache.org/solr/guide/6_6/working-with- > external-files-and-processes.html <https://lucene.apache.org/ > solr/guide/6_6/working-with-external-files-and-processes.html> > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 20 Feb 2018, at 20:39, Roman Chyla <roman.ch...@gmail.com> wrote: > > > > Say there is a high load and I'd like to bring a new machine and let it > > replicate the index, if 100gb and more can be shaved, it will have a > > significant impact on how quickly the new searcher is ready and added to > > the cluster. Impact on the search speed is likely minimal. > > > > we are investigating the idea of two clusters but i have to say it seems > to > > me more complex than storing/loading a field from an external source. > > having said that, I wonder why this was not done before (maybe it was) > and > > what the cons are (besides the obvious ones: maintenance and the database > > being potential point of failure; well in that case i'd miss highlights - > > can live with that...) > > > > On Tue, Feb 20, 2018 at 10:36 AM, David Hastings < > > hastings.recurs...@gmail.com> wrote: > > > >> Really depends on what you consider too large, and why the size is a big > >> issue, since most replication will go at about 100mg/second give or > take, > >> and replicating a 300GB index is only an hour or two. What i do for > this > >> purpose is store my text in a separate index altogether, and call on > that > >> core for highlighting. So for my use case, the primary index with no > >> stored text is around 300GB and replicates as needed, and the full text > >> indexes with stored text totals around 500GB and are replicating non > stop. > >> All searching goes against the primary index, and for highlighting i > call > >> on the full text indexes that have a stupid simple schema. This has > worked > >> for me pretty well at least. > >> > >> On Tue, Feb 20, 2018 at 10:27 AM, Roman Chyla <roman.ch...@gmail.com> > >> wrote: > >> > >>> Hello, > >>> > >>> We have a use case of a very large index (slave-master; for unrelated > >>> reasons the search cannot work in the cloud mode) - one of the fields > is > >> a > >>> very large text, stored mostly for highlighting. To cut down the index > >> size > >>> (for purposes of replication/scaling) I thought I could try to save it > >> in a > >>> database - and not in the index. > >>> > >>> Lucene has codecs - one of the methods is for 'stored field', so that > >> seems > >>> likes a natural path for me. > >>> > >>> However, I'd expect somebody else before had a similar problem. I > googled > >>> and couldn't find any solutions. Using the codecs seems really good > thing > >>> for this particular problem, am I missing something? Is there a better > >> way > >>> to cut down on index size? (besides solr cloud/sharding, compression) > >>> > >>> Thank you, > >>> > >>> Roman > >>> > >> > >