The script can actually be written an any number of scripting languages, 
python, groovy,
javascript etc. but Alexandre’s comments about javascript are well taken.

It all depends here on whether you every want to search the fields 
individually. If you do,
you need to have them in your index as well as the copyField.

> On Sep 17, 2020, at 1:37 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> If you want to ignore a field being sent to Solr, you can set indexed=false 
> and 
> stored=false for that field in schema.xml. It will take up room in schema.xml 
> but
> zero room on disk.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch <arafa...@gmail.com> 
>> wrote:
>> 
>> Solr has a whole pipeline that you can run during document ingesting before
>> the actual indexing happens. It is called Update Request Processor (URP)
>> and is defined in solrconfig.xml or in an override file. Obviously, since
>> you are indexing from SolrJ client, you have even more flexibility, but it
>> is good to know about anyway.
>> 
>> You can read all about it at:
>> https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and
>> see the extensive list of processors you can leverage. The specific
>> mentioned one is this one:
>> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
>> 
>> Just a word of warning that Stateless URP is using Javascript, which is
>> getting a bit of a complicated story as underlying JVM is upgraded (Oracle
>> dropped their javascript engine in JDK 14). So if one of the simpler URPs
>> will do the job or a chain of them, that may be a better path to take.
>> 
>> Regards,
>>  Alex.
>> 
>> 
>> On Thu, 17 Sep 2020 at 13:13, Steven White <swhite4...@gmail.com> wrote:
>> 
>>> Thanks Erick.  Where can I learn more about "stateless script update
>>> processor factory".  I don't know what you mean by this.
>>> 
>>> Steven
>>> 
>>> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson <erickerick...@gmail.com>
>>> wrote:
>>> 
>>>> 1000 fields is fine, you'll waste some cycles on bookkeeping, but I
>>> really
>>>> doubt you'll notice. That said, are these fields used for searching?
>>>> Because you do have control over what gous into the index if you can put
>>> a
>>>> "stateless script update processor factory" in your update chain. There
>>> you
>>>> can do whatever you want, including combine all the fields into one and
>>>> delete the original fields. There's no point in having your index
>>> cluttered
>>>> with unused fields, OTOH, it may not be worth the effort just to satisfy
>>> my
>>>> sense of aesthetics 😉
>>>> 
>>>> On Thu, Sep 17, 2020, 12:59 Steven White <swhite4...@gmail.com> wrote:
>>>> 
>>>>> Hi Eric,
>>>>> 
>>>>> Yes, this is coming from a DB.  Unfortunately I have no control over
>>> the
>>>>> list of fields.  Out of the 1000 fields that there maybe, no document,
>>>> that
>>>>> gets indexed into Solr will use more then about 50 and since i'm
>>> copying
>>>>> the values of those fields to the catch-all field and the catch-all
>>> field
>>>>> is my default search field, I don't expect any problem for having 1000
>>>>> fields in Solr's schema, or should I?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Steven
>>>>> 
>>>>> 
>>>>> On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
>>> erickerick...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> “there over 1000 of them[fields]”
>>>>>> 
>>>>>> This is often a red flag in my experience. Solr will handle that many
>>>>>> fields, I’ve seen many more. But this is often a result of
>>>>>> “database thinking”, i.e. your mental model of how all this data
>>>>>> is from a DB perspective rather than a search perspective.
>>>>>> 
>>>>>> It’s unwieldy to have that many fields. Obviously I don’t know the
>>>>>> particulars of
>>>>>> your app, and maybe that’s the best design. Particularly if many of
>>> the
>>>>>> fields
>>>>>> are sparsely populated, i.e. only a small percentage of the documents
>>>> in
>>>>>> your
>>>>>> corpus have any value for that field then taking a step back and
>>>> looking
>>>>>> at the design might save you some grief down the line.
>>>>>> 
>>>>>> For instance, I’ve seen designs where instead of
>>>>>> field1:some_value
>>>>>> field2:other_value….
>>>>>> 
>>>>>> you use a single field with _tokens_ like:
>>>>>> field:field1_some_value
>>>>>> field:field2_other_value
>>>>>> 
>>>>>> that drops the complexity and increases performance.
>>>>>> 
>>>>>> Anyway, just a thought you might want to consider.
>>>>>> 
>>>>>> Best,
>>>>>> Erick
>>>>>> 
>>>>>>> On Sep 16, 2020, at 9:31 PM, Steven White <swhite4...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> 
>>>>>>> I figured it out.  It is as simple as creating a List<String> and
>>>> using
>>>>>>> that as the value part for SolrInputDocument.addField() API.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Steven
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Sep 16, 2020 at 9:13 PM Steven White <swhite4...@gmail.com
>>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi everyone,
>>>>>>>> 
>>>>>>>> I want to avoid creating a <copyField dest="CatchAll"
>>>>>>>> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
>>>>> them
>>>>>> and
>>>>>>>> maybe more so managing it will be a pain).  Instead, I want to use
>>>>> SolrJ
>>>>>>>> API to do what <copyField/> does.  Any example of how I can do
>>> this?
>>>>> If
>>>>>>>> there is an example online, that would be great.
>>>>>>>> 
>>>>>>>> Thanks in advance.
>>>>>>>> 
>>>>>>>> Steven
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
> 

Reply via email to