RE: Need additional data processing in Data Import Handler prior to indexing
Would an onImportEnd event listener serve your needs? See http://wiki.apache.org/solr/DataImportHandler#EventListeners James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Dileepa Jayakody [mailto:dileepajayak...@gmail.com] Sent: Tuesday, October 29, 2013 3:48 PM To: solr-user@lucene.apache.org Subject: Need additional data processing in Data Import Handler prior to indexing Hi All, I'm a newbie to Solr, and I have a requirement to import data from a mysql database; enhance the imported content to identify Persons mentioned and index it as a separate field in Solr along with the other fields defined for the original db query. I'm using Apache Stanbol [1] for the content enhancement requirement. I can get enhancement results for 'Person' type data in the content as the enhancement result. The data flow will be; mysql-db Solr data-import handler Stanbol enhancer Solr index For the above requirement I need to perform additional processing at the data-import handler prior to indexing to send a request to Stanbol and process the enhancement response. I found some related examples on modifying mysql data import handler to customize the query results in db-data-config.xml by using a transformer script. As per my requirement, In the data-import-handler I need to send a request to Stanbol and process the response prior to indexing. But I'm not sure if this can be achieved using a simple javascript. Is there any other better way of achieving my requirement? Maybe writing a custom filter in Solr? Please share your thoughts. Appreciate any pointers as I'm a beginner for Solr. Thanks, Dileepa [1] https://stanbol.apache.org
Re: Need additional data processing in Data Import Handler prior to indexing
Hi Dileepa, You can write your own Transformers in Java. If it doesn't make sense to run Stanbol calls in a Transformer, maybe setting up a web service that grabs a record out of MySQL, sends the data to Stanbol, and displays the results could be used in conjunction with HttpDataSource rather than JdbcDataSource. http://wiki.apache.org/solr/DIHCustomTransformer http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody dileepajayak...@gmail.com wrote: Hi All, I'm a newbie to Solr, and I have a requirement to import data from a mysql database; enhance the imported content to identify Persons mentioned and index it as a separate field in Solr along with the other fields defined for the original db query. I'm using Apache Stanbol [1] for the content enhancement requirement. I can get enhancement results for 'Person' type data in the content as the enhancement result. The data flow will be; mysql-db Solr data-import handler Stanbol enhancer Solr index For the above requirement I need to perform additional processing at the data-import handler prior to indexing to send a request to Stanbol and process the enhancement response. I found some related examples on modifying mysql data import handler to customize the query results in db-data-config.xml by using a transformer script. As per my requirement, In the data-import-handler I need to send a request to Stanbol and process the response prior to indexing. But I'm not sure if this can be achieved using a simple javascript. Is there any other better way of achieving my requirement? Maybe writing a custom filter in Solr? Please share your thoughts. Appreciate any pointers as I'm a beginner for Solr. Thanks, Dileepa [1] https://stanbol.apache.org
Re: Need additional data processing in Data Import Handler prior to indexing
It's also possible to combine Update Request Processor with DIH. That way if a debug entry needs to be inserted it could go through the same Stanbol process. Just define a processing chain the DIH handler and write custom URP to call out to Stanbol web service. You have access to a full record in URP, so can add/delete/change the fields at will. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi Dileepa, You can write your own Transformers in Java. If it doesn't make sense to run Stanbol calls in a Transformer, maybe setting up a web service that grabs a record out of MySQL, sends the data to Stanbol, and displays the results could be used in conjunction with HttpDataSource rather than JdbcDataSource. http://wiki.apache.org/solr/DIHCustomTransformer http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody dileepajayak...@gmail.com wrote: Hi All, I'm a newbie to Solr, and I have a requirement to import data from a mysql database; enhance the imported content to identify Persons mentioned and index it as a separate field in Solr along with the other fields defined for the original db query. I'm using Apache Stanbol [1] for the content enhancement requirement. I can get enhancement results for 'Person' type data in the content as the enhancement result. The data flow will be; mysql-db Solr data-import handler Stanbol enhancer Solr index For the above requirement I need to perform additional processing at the data-import handler prior to indexing to send a request to Stanbol and process the enhancement response. I found some related examples on modifying mysql data import handler to customize the query results in db-data-config.xml by using a transformer script. As per my requirement, In the data-import-handler I need to send a request to Stanbol and process the response prior to indexing. But I'm not sure if this can be achieved using a simple javascript. Is there any other better way of achieving my requirement? Maybe writing a custom filter in Solr? Please share your thoughts. Appreciate any pointers as I'm a beginner for Solr. Thanks, Dileepa [1] https://stanbol.apache.org
Re: Need additional data processing in Data Import Handler prior to indexing
Third time tonight I've been able to paste this link Also, you can consider just moving to SolrJ and taking DIH out of the process, see: http://searchhub.org/2012/02/14/indexing-with-solrj/ Whichever approach fits your needs of course. Best, Erick On Tue, Oct 29, 2013 at 7:15 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: It's also possible to combine Update Request Processor with DIH. That way if a debug entry needs to be inserted it could go through the same Stanbol process. Just define a processing chain the DIH handler and write custom URP to call out to Stanbol web service. You have access to a full record in URP, so can add/delete/change the fields at will. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi Dileepa, You can write your own Transformers in Java. If it doesn't make sense to run Stanbol calls in a Transformer, maybe setting up a web service that grabs a record out of MySQL, sends the data to Stanbol, and displays the results could be used in conjunction with HttpDataSource rather than JdbcDataSource. http://wiki.apache.org/solr/DIHCustomTransformer http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody dileepajayak...@gmail.com wrote: Hi All, I'm a newbie to Solr, and I have a requirement to import data from a mysql database; enhance the imported content to identify Persons mentioned and index it as a separate field in Solr along with the other fields defined for the original db query. I'm using Apache Stanbol [1] for the content enhancement requirement. I can get enhancement results for 'Person' type data in the content as the enhancement result. The data flow will be; mysql-db Solr data-import handler Stanbol enhancer Solr index For the above requirement I need to perform additional processing at the data-import handler prior to indexing to send a request to Stanbol and process the enhancement response. I found some related examples on modifying mysql data import handler to customize the query results in db-data-config.xml by using a transformer script. As per my requirement, In the data-import-handler I need to send a request to Stanbol and process the response prior to indexing. But I'm not sure if this can be achieved using a simple javascript. Is there any other better way of achieving my requirement? Maybe writing a custom filter in Solr? Please share your thoughts. Appreciate any pointers as I'm a beginner for Solr. Thanks, Dileepa [1] https://stanbol.apache.org
Re: Need additional data processing in Data Import Handler prior to indexing
Thanks guys for your ideas. I will go through them and come back with questions. Regards, Dileepa On Wed, Oct 30, 2013 at 7:00 AM, Erick Erickson erickerick...@gmail.comwrote: Third time tonight I've been able to paste this link Also, you can consider just moving to SolrJ and taking DIH out of the process, see: http://searchhub.org/2012/02/14/indexing-with-solrj/ Whichever approach fits your needs of course. Best, Erick On Tue, Oct 29, 2013 at 7:15 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: It's also possible to combine Update Request Processor with DIH. That way if a debug entry needs to be inserted it could go through the same Stanbol process. Just define a processing chain the DIH handler and write custom URP to call out to Stanbol web service. You have access to a full record in URP, so can add/delete/change the fields at will. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi Dileepa, You can write your own Transformers in Java. If it doesn't make sense to run Stanbol calls in a Transformer, maybe setting up a web service that grabs a record out of MySQL, sends the data to Stanbol, and displays the results could be used in conjunction with HttpDataSource rather than JdbcDataSource. http://wiki.apache.org/solr/DIHCustomTransformer http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody dileepajayak...@gmail.com wrote: Hi All, I'm a newbie to Solr, and I have a requirement to import data from a mysql database; enhance the imported content to identify Persons mentioned and index it as a separate field in Solr along with the other fields defined for the original db query. I'm using Apache Stanbol [1] for the content enhancement requirement. I can get enhancement results for 'Person' type data in the content as the enhancement result. The data flow will be; mysql-db Solr data-import handler Stanbol enhancer Solr index For the above requirement I need to perform additional processing at the data-import handler prior to indexing to send a request to Stanbol and process the enhancement response. I found some related examples on modifying mysql data import handler to customize the query results in db-data-config.xml by using a transformer script. As per my requirement, In the data-import-handler I need to send a request to Stanbol and process the response prior to indexing. But I'm not sure if this can be achieved using a simple javascript. Is there any other better way of achieving my requirement? Maybe writing a custom filter in Solr? Please share your thoughts. Appreciate any pointers as I'm a beginner for Solr. Thanks, Dileepa [1] https://stanbol.apache.org