Re: Indexing a token to a different field in a custom filter
I need to index the processed token to a different feild (eg: stanbolResponse), in the same document that's being indexed. I am looking for a way to retrieve the document.id from the TokenStream so that I can update the same document with new field values. (In my sample code above I'm adding a new document, instead of updating the same document) Any pointers please? Thanks, Dileepa On Tue, Nov 12, 2013 at 12:01 PM, Dileepa Jayakody dileepajayak...@gmail.com wrote: Hi All, In my custom filter, I need to index the processed token into a different field. The processed token is a Stanbol enhancement response. The solution I have so far found is to use a Solr client (solj) to add a new Document with my processed field into Solr. Below is the sample code segment; SolrServer server = new HttpSolrServer(http://localhost:8983/solr/;); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( id, id1, 1.0f ); doc1.addField(stanbolResponse, response); try { server.add(doc1); server.commit(); } catch (SolrServerException e) { e.printStackTrace(); } This mechanism requires a new HTTP call to the local Solr server for every token I process for the stanbolRequest field, and I feel it's not very efficient. Is there any other alternative way to invoke a update request to add a new field to the indexing document within the filter (without making an explicit HTTP call using Solrj)? Thanks, Dileepa
Re: Indexing a token to a different field in a custom filter
Hi, Maybe the synonym filterhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactoryis the mirror you can look in. You can start creating a new field type in your schema that is stanbol enhanced. Let's follow with the parallelism, in the case of synonym we could have this schema: ... fielType name=synonymtext class=solr.TextField positionIncrementGap=100 tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / /fieldType ... field name=id type=string indexed=true stored=true required=true / field name=description type=synonymtext indexed=true stored=true multiValued=true / ... In the case of stanbol: ... fielType name=stanboltext class=solr.TextField positionIncrementGap=100 tokenizer class=solr.WhitespaceTokenizerFactory / filter class=StanbolFilterFactory your Stanbol filter parameters here / /fieldType ... field name=id type=string indexed=true stored=true required=true / field name=description type=synonymtext indexed=true stored=true multiValued=true / ... Thus the StanbolFilterFactory is in charge of connecting ot Stanbol and enhance the data coming from WhitespaceTokenizerFactory, creating an output that can be used by other filters. How do you index your data, then? Just send your doc: id:your id description:the data to be enhanced Other path you can follow is imitate the behaviour of CopyFieldhttp://wiki.apache.org/solr/SchemaXml#Copy_Fieldsin a more sofisticated fashion i.e. (copy, enhance an put in a new field). The you can have the next schema: ... fielType name=text class=solr.TextField positionIncrementGap=100 tokenizer class=solr.WhitespaceTokenizerFactory / /fieldType ... field name=id type=string indexed=true stored=true required=true / field name=description type=text indexed=true stored=true multiValued=true / field name=enhancedDescription type=text indexed=true stored=true multiValued=true / copyEnhanceField source=description dest=enhancedDescription / The copyEnhanceField is now in charge of take the original field, send to stanbol, get the response and write it in the new field. How do you index your data then? Just send your doc: id:your id description:the original data And you will get in solr: id:your id description:the original data enhancedDescription:the enhanced data Regards
Re: Indexing a token to a different field in a custom filter
Whether what Alvaro outlined works for you or not, do NOT commit after every document if you use SolrJ. The commit will hurt performance much more than the HTTP overhead. And you can always batch up, say, 1,000 documents and use the server.add(doclist) method. Overall, worrying about HTTP overhead is usually a red herring. Best, Erick On Tue, Nov 12, 2013 at 3:20 AM, Alvaro Cabrerizo topor...@gmail.comwrote: Hi, Maybe the synonym filter http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory is the mirror you can look in. You can start creating a new field type in your schema that is stanbol enhanced. Let's follow with the parallelism, in the case of synonym we could have this schema: ... fielType name=synonymtext class=solr.TextField positionIncrementGap=100 tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / /fieldType ... field name=id type=string indexed=true stored=true required=true / field name=description type=synonymtext indexed=true stored=true multiValued=true / ... In the case of stanbol: ... fielType name=stanboltext class=solr.TextField positionIncrementGap=100 tokenizer class=solr.WhitespaceTokenizerFactory / filter class=StanbolFilterFactory your Stanbol filter parameters here / /fieldType ... field name=id type=string indexed=true stored=true required=true / field name=description type=synonymtext indexed=true stored=true multiValued=true / ... Thus the StanbolFilterFactory is in charge of connecting ot Stanbol and enhance the data coming from WhitespaceTokenizerFactory, creating an output that can be used by other filters. How do you index your data, then? Just send your doc: id:your id description:the data to be enhanced Other path you can follow is imitate the behaviour of CopyFieldhttp://wiki.apache.org/solr/SchemaXml#Copy_Fieldsin a more sofisticated fashion i.e. (copy, enhance an put in a new field). The you can have the next schema: ... fielType name=text class=solr.TextField positionIncrementGap=100 tokenizer class=solr.WhitespaceTokenizerFactory / /fieldType ... field name=id type=string indexed=true stored=true required=true / field name=description type=text indexed=true stored=true multiValued=true / field name=enhancedDescription type=text indexed=true stored=true multiValued=true / copyEnhanceField source=description dest=enhancedDescription / The copyEnhanceField is now in charge of take the original field, send to stanbol, get the response and write it in the new field. How do you index your data then? Just send your doc: id:your id description:the original data And you will get in solr: id:your id description:the original data enhancedDescription:the enhanced data Regards
Re: Indexing a token to a different field in a custom filter
Any kind of cross-field processing is best done in an update processor. There are a lot of built-in update processors as well as a JavaScript script update processor. -- Jack Krupansky -Original Message- From: Dileepa Jayakody Sent: Tuesday, November 12, 2013 1:31 AM To: solr-user@lucene.apache.org Subject: Indexing a token to a different field in a custom filter Hi All, In my custom filter, I need to index the processed token into a different field. The processed token is a Stanbol enhancement response. The solution I have so far found is to use a Solr client (solj) to add a new Document with my processed field into Solr. Below is the sample code segment; SolrServer server = new HttpSolrServer(http://localhost:8983/solr/;); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( id, id1, 1.0f ); doc1.addField(stanbolResponse, response); try { server.add(doc1); server.commit(); } catch (SolrServerException e) { e.printStackTrace(); } This mechanism requires a new HTTP call to the local Solr server for every token I process for the stanbolRequest field, and I feel it's not very efficient. Is there any other alternative way to invoke a update request to add a new field to the indexing document within the filter (without making an explicit HTTP call using Solrj)? Thanks, Dileepa
Re: Indexing a token to a different field in a custom filter
Thanks all for your valuable inputs. I looked at suggested solutions and I too feel, a* custom update processor*during indexing will be the best solution to handle the content field by changing the value and storing it in another value. Do I only need to change the below request handler to intercept all indexing documents to perform my custom analysis during indexing? Or do I need to change any other request handler also? requestHandler name=/update class=solr.UpdateRequestHandler Thanks, Dileepa On Tue, Nov 12, 2013 at 7:37 PM, Jack Krupansky j...@basetechnology.comwrote: Any kind of cross-field processing is best done in an update processor. There are a lot of built-in update processors as well as a JavaScript script update processor. -- Jack Krupansky -Original Message- From: Dileepa Jayakody Sent: Tuesday, November 12, 2013 1:31 AM To: solr-user@lucene.apache.org Subject: Indexing a token to a different field in a custom filter Hi All, In my custom filter, I need to index the processed token into a different field. The processed token is a Stanbol enhancement response. The solution I have so far found is to use a Solr client (solj) to add a new Document with my processed field into Solr. Below is the sample code segment; SolrServer server = new HttpSolrServer(http://localhost:8983/solr/;); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( id, id1, 1.0f ); doc1.addField(stanbolResponse, response); try { server.add(doc1); server.commit(); } catch (SolrServerException e) { e.printStackTrace(); } This mechanism requires a new HTTP call to the local Solr server for every token I process for the stanbolRequest field, and I feel it's not very efficient. Is there any other alternative way to invoke a update request to add a new field to the indexing document within the filter (without making an explicit HTTP call using Solrj)? Thanks, Dileepa
Indexing a token to a different field in a custom filter
Hi All, In my custom filter, I need to index the processed token into a different field. The processed token is a Stanbol enhancement response. The solution I have so far found is to use a Solr client (solj) to add a new Document with my processed field into Solr. Below is the sample code segment; SolrServer server = new HttpSolrServer(http://localhost:8983/solr/;); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( id, id1, 1.0f ); doc1.addField(stanbolResponse, response); try { server.add(doc1); server.commit(); } catch (SolrServerException e) { e.printStackTrace(); } This mechanism requires a new HTTP call to the local Solr server for every token I process for the stanbolRequest field, and I feel it's not very efficient. Is there any other alternative way to invoke a update request to add a new field to the indexing document within the filter (without making an explicit HTTP call using Solrj)? Thanks, Dileepa