Solr has a whole pipeline that you can run during document ingesting before
the actual indexing happens. It is called Update Request Processor (URP)
and is defined in solrconfig.xml or in an override file. Obviously, since
you are indexing from SolrJ client, you have even more flexibility, but it
is good to know about anyway.

You can read all about it at:
https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and
see the extensive list of processors you can leverage. The specific
mentioned one is this one:
https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html

Just a word of warning that Stateless URP is using Javascript, which is
getting a bit of a complicated story as underlying JVM is upgraded (Oracle
dropped their javascript engine in JDK 14). So if one of the simpler URPs
will do the job or a chain of them, that may be a better path to take.

Regards,
   Alex.


On Thu, 17 Sep 2020 at 13:13, Steven White <swhite4...@gmail.com> wrote:

> Thanks Erick.  Where can I learn more about "stateless script update
> processor factory".  I don't know what you mean by this.
>
> Steven
>
> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > 1000 fields is fine, you'll waste some cycles on bookkeeping, but I
> really
> > doubt you'll notice. That said, are these fields used for searching?
> > Because you do have control over what gous into the index if you can put
> a
> > "stateless script update processor factory" in your update chain. There
> you
> > can do whatever you want, including combine all the fields into one and
> > delete the original fields. There's no point in having your index
> cluttered
> > with unused fields, OTOH, it may not be worth the effort just to satisfy
> my
> > sense of aesthetics 😉
> >
> > On Thu, Sep 17, 2020, 12:59 Steven White <swhite4...@gmail.com> wrote:
> >
> > > Hi Eric,
> > >
> > > Yes, this is coming from a DB.  Unfortunately I have no control over
> the
> > > list of fields.  Out of the 1000 fields that there maybe, no document,
> > that
> > > gets indexed into Solr will use more then about 50 and since i'm
> copying
> > > the values of those fields to the catch-all field and the catch-all
> field
> > > is my default search field, I don't expect any problem for having 1000
> > > fields in Solr's schema, or should I?
> > >
> > > Thanks
> > >
> > > Steven
> > >
> > >
> > > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > > > “there over 1000 of them[fields]”
> > > >
> > > > This is often a red flag in my experience. Solr will handle that many
> > > > fields, I’ve seen many more. But this is often a result of
> > > > “database thinking”, i.e. your mental model of how all this data
> > > > is from a DB perspective rather than a search perspective.
> > > >
> > > > It’s unwieldy to have that many fields. Obviously I don’t know the
> > > > particulars of
> > > > your app, and maybe that’s the best design. Particularly if many of
> the
> > > > fields
> > > > are sparsely populated, i.e. only a small percentage of the documents
> > in
> > > > your
> > > > corpus have any value for that field then taking a step back and
> > looking
> > > > at the design might save you some grief down the line.
> > > >
> > > > For instance, I’ve seen designs where instead of
> > > > field1:some_value
> > > > field2:other_value….
> > > >
> > > > you use a single field with _tokens_ like:
> > > > field:field1_some_value
> > > > field:field2_other_value
> > > >
> > > > that drops the complexity and increases performance.
> > > >
> > > > Anyway, just a thought you might want to consider.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > > On Sep 16, 2020, at 9:31 PM, Steven White <swhite4...@gmail.com>
> > > wrote:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > I figured it out.  It is as simple as creating a List<String> and
> > using
> > > > > that as the value part for SolrInputDocument.addField() API.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Steven
> > > > >
> > > > >
> > > > > On Wed, Sep 16, 2020 at 9:13 PM Steven White <swhite4...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > >> Hi everyone,
> > > > >>
> > > > >> I want to avoid creating a <copyField dest="CatchAll"
> > > > >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
> > > them
> > > > and
> > > > >> maybe more so managing it will be a pain).  Instead, I want to use
> > > SolrJ
> > > > >> API to do what <copyField/> does.  Any example of how I can do
> this?
> > > If
> > > > >> there is an example online, that would be great.
> > > > >>
> > > > >> Thanks in advance.
> > > > >>
> > > > >> Steven
> > > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to