Re: Solr as a dedicated data store?

David Hastings Fri, 08 Apr 2022 19:31:59 -0700

As long as your documents are simple in structure. A key value or an array
for any given field, you’re good to go. Anything multi level, you’re out of
luck. Not sure how relevant this link is still but:
https://stackoverflow.com/questions/22192904/is-solr-support-complex-types-like-structure-for-multivalued-fields



It’s from 2017 but believe it still holds true however there are
possibilities with nested documents
https://solr.apache.org/guide/8_1/indexing-nested-documents.html

Admittedly I have not gotten too in depth myself with child documents for
more complex data structures. And yeah you could just store the complex
data structure into a single large text stored and non indexed field as
json and only index what you will be searching on.

Another option I’ve experimented with is two completely different cores or
even completely different solr servers (I use stand alone a lot) use one
for searching and use the result to pull the raw data from the other
“storage server” by an identifier.  This is actually surprisingly fast.

It’s a hack, you’re using the wrong tool for the job, but it can be done if
you REALLY want to and get creative.

Good luck. Curious to hear what you come up with
-dave


On Fri, Apr 8, 2022 at 8:36 PM James Greene <ja...@jamesaustingreene.com>
wrote:

> I think you are speaking to the point that the requirement to have all your
> data rebuildable from source isn't a hard requirement as their are ways to
> re-index without having access to the original source (you still need the
> full docs stored in solr just not indexed). By looking at solr from that
> pov it becomes more approachable as a primary data store.
>
> On Fri, Apr 8, 2022, 1:53 PM dmitri maziuk <dmitri.maz...@gmail.com>
> wrote:
>
> > On 2022-04-07 11:51 PM, Shawn Heisey wrote:
> > ...
> > > As I understand it, ES offers reindex capability by storing the entire
> > > input document into a field in the index.  Which means that the index
> > > will be lot bigger than it needs to be, which is going to affect
> > > performance.  If the field is not indexed, then the performance impact
> > > may not be huge, but it will not be zero.  And it wouldn't really
> > > improve the speed of a full reindex, it just makes it possible to do a
> > > reindex without an external data source.
> > >
> > > The same thing can be done with Solr, and it is something I would
> > > definitely say needs to be part of any index design where Solr will be
> a
> > > primary data store.  That capability should be available in Solr, but I
> > > do not think it should be enabled by default.
> > >
> > What would be the advantage over dumping the documents into a text file
> > (xml, json) and doing a full re-import? In principle you could dump
> > everything Solr needs into the file and only check if it's all there
> > during the import; that plus the protocol overhead would be the only
> > downside. And deleting the existing index will take a little extra time.
> >
> > The upside if we can stick the files into git and have versions, it
> > should compress really well, we can clone it to off-site storage etc.
> etc.
> >
> > Dima
> >
>

Re: Solr as a dedicated data store?

Reply via email to