Re: storing large text fields in a database? (instead of inside index)

Roman Chyla Wed, 21 Feb 2018 11:56:10 -0800

Hi and thanks, Emir! FieldType might indeed be another layer where the
logic could live.


On Wed, Feb 21, 2018 at 6:32 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi,
> Maybe you could use external field type as an example how to hook up
> values from DB: https://lucene.apache.org/solr/guide/6_6/working-with-
> external-files-and-processes.html <https://lucene.apache.org/
> solr/guide/6_6/working-with-external-files-and-processes.html>
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 20 Feb 2018, at 20:39, Roman Chyla <roman.ch...@gmail.com> wrote:
> >
> > Say there is a high load and  I'd like to bring a new machine and let it
> > replicate the index, if 100gb and more can be shaved, it will have a
> > significant impact on how quickly the new searcher is ready and added to
> > the cluster. Impact on the search speed is likely minimal.
> >
> > we are investigating the idea of two clusters but i have to say it seems
> to
> > me more complex than storing/loading a field from an external source.
> > having said that, I wonder why this was not done before (maybe it was)
> and
> > what the cons are (besides the obvious ones: maintenance and the database
> > being potential point of failure; well in that case i'd miss highlights -
> > can live with that...)
> >
> > On Tue, Feb 20, 2018 at 10:36 AM, David Hastings <
> > hastings.recurs...@gmail.com> wrote:
> >
> >> Really depends on what you consider too large, and why the size is a big
> >> issue, since most replication will go at about 100mg/second give or
> take,
> >> and replicating a 300GB index is only an hour or two.  What i do for
> this
> >> purpose is store my text in a separate index altogether, and call on
> that
> >> core for highlighting.  So for my use case, the primary index with no
> >> stored text is around 300GB and replicates as needed, and the full text
> >> indexes with stored text totals around 500GB and are replicating non
> stop.
> >> All searching goes against the primary index, and for highlighting i
> call
> >> on the full text indexes that have a stupid simple schema.  This has
> worked
> >> for me pretty well at least.
> >>
> >> On Tue, Feb 20, 2018 at 10:27 AM, Roman Chyla <roman.ch...@gmail.com>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> We have a use case of a very large index (slave-master; for unrelated
> >>> reasons the search cannot work in the cloud mode) - one of the fields
> is
> >> a
> >>> very large text, stored mostly for highlighting. To cut down the index
> >> size
> >>> (for purposes of replication/scaling) I thought I could try to save it
> >> in a
> >>> database - and not in the index.
> >>>
> >>> Lucene has codecs - one of the methods is for 'stored field', so that
> >> seems
> >>> likes a natural path for me.
> >>>
> >>> However, I'd expect somebody else before had a similar problem. I
> googled
> >>> and couldn't find any solutions. Using the codecs seems really good
> thing
> >>> for this particular problem, am I missing something? Is there a better
> >> way
> >>> to cut down on index size? (besides solr cloud/sharding, compression)
> >>>
> >>> Thank you,
> >>>
> >>>   Roman
> >>>
> >>
>
>

Re: storing large text fields in a database? (instead of inside index)

Reply via email to