That's great!! Got it. Thank you very much.
On Wed, Nov 22, 2017 at 5:07 PM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Roxana, > The idea with update request processor is to have following parameters: > * inputField - document field with text to analyse > * sharedAnalysis - field type with shared analysis definition > * targetFields - comma separated list of fields where results should be > stored. > * fieldSpecificAnalysis - comma separated list of field types that defines > specifics for each field (reusing schema will have extra tokenizer that > should be ignored) > > Your update processor uses TeeSinkTokenFilter to create tokens for each > field, but you do not write those tokens to index. You add new fields to > document where each token is new value (or can concat and have whitespace > tokenizer in indexing analysis chain of target field). You can remove > inputField from document. > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 22 Nov 2017, at 17:46, Roxana Danger <roxana.dan...@gmail.com> wrote: > > > > Hi Emir, > > In this case, I need more control at Lucene level, so I have to use the > > lucene index writer directly. So, I can not use Solr for importing. > > Or, is there anyway I can add a tokenstream to a SolrInputDocument (is > > there any other class exposed by Solr during indexing that I can use for > > this purpose?). > > Am I correct or still missing something? > > Thank you. > > > > > > On Wed, Nov 22, 2017 at 11:33 AM, Emir Arnautović < > > emir.arnauto...@sematext.com> wrote: > > > >> Hi Roxana, > >> I think you can use https://lucene.apache.org/ > core/5_4_0/analyzers-common/ > >> org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html < > >> https://lucene.apache.org/core/5_4_0/analyzers-common/ > >> org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html> like > suggested > >> earlier. > >> > >> HTH, > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection > >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 22 Nov 2017, at 11:43, Roxana Danger <roxana.dan...@gmail.com> > wrote: > >>> > >>> Hi Emir, > >>> Many thanks for your reply. > >>> The UpdateProcessor can do this work, but is > analyzer.reusableTokenStream > >>> <https://lucene.apache.org/core/3_0_3/api/core/org/ > >> apache/lucene/analysis/Analyzer.html#reusableTokenStream(java.lang. > String, > >>> java.io.Reader)> the way to obtain a previous generated tokenstream? is > >> it > >>> guarantee to get access to the token stream and not reconstruct it? > >>> Thanks, > >>> Roxana > >>> > >>> > >>> On Wed, Nov 22, 2017 at 10:26 AM, Emir Arnautović < > >>> emir.arnauto...@sematext.com> wrote: > >>> > >>>> Hi Roxana, > >>>> I don’t think that it is possible. In some cases (seems like yours is > >> good > >>>> fit) you could create custom update request processor that would do > the > >>>> shared analysis (you can have it defined in schema) and after analysis > >> use > >>>> those tokens to create new values for those two fields and remove > source > >>>> value (or flag it as ignored in schema). > >>>> > >>>> HTH, > >>>> Emir > >>>> -- > >>>> Monitoring - Log Management - Alerting - Anomaly Detection > >>>> Solr & Elasticsearch Consulting Support Training - > http://sematext.com/ > >>>> > >>>> > >>>> > >>>>> On 22 Nov 2017, at 11:09, Roxana Danger <roxana.dan...@gmail.com> > >> wrote: > >>>>> > >>>>> Hello all, > >>>>> > >>>>> I would like to reuse the tokenstream generated for one field, to > >> create > >>>> a > >>>>> new tokenstream (adding a few filters to the available tokenstream), > >> for > >>>>> another field without the need of executing again the whole analysis. > >>>>> > >>>>> The particular application is: > >>>>> - I have field *tokens* that uses an analyzer that generate the > tokens > >>>> (and > >>>>> maintains the token type attributes) > >>>>> - I would like to have another two new fields: *verbs* and > >> *adjectives*. > >>>>> These should reuse the tokenstream generated for the field *tokens* > and > >>>>> filter the verbs and adjectives for the respective fields. > >>>>> > >>>>> Is this feasible? How should it be implemented? > >>>>> > >>>>> Many thanks. > >>>> > >>>> > >> > >> > >