RE: Need additional data processing in Data Import Handler prior to indexing

2013-10-29 Thread Dyer, James
Would an onImportEnd event listener serve your needs?

See http://wiki.apache.org/solr/DataImportHandler#EventListeners

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Dileepa Jayakody [mailto:dileepajayak...@gmail.com] 
Sent: Tuesday, October 29, 2013 3:48 PM
To: solr-user@lucene.apache.org
Subject: Need additional data processing in Data Import Handler prior to 
indexing

Hi All,

I'm a newbie to Solr, and I have a requirement to import data from a mysql
database; enhance  the imported content to identify Persons mentioned  and
index it as a separate field in Solr along with the other fields defined
for the original db query.

I'm using Apache Stanbol [1] for the content enhancement requirement.
I can get enhancement results for 'Person' type data in the content as the
enhancement result.

The data flow will be;
mysql-db  Solr data-import handler  Stanbol enhancer  Solr index

For the above requirement I need to perform additional processing at the
data-import handler prior to indexing to send a request to Stanbol and
process the enhancement response. I found some related examples on
modifying mysql data import handler to customize the query results in
db-data-config.xml by using a transformer script.
As per my requirement, In the data-import-handler I need to send a request
to Stanbol and process the response prior to indexing. But I'm not sure if
this can be achieved using a simple javascript.

Is there any other better way of achieving my requirement? Maybe writing a
custom filter in Solr?
Please share your thoughts. Appreciate any pointers as I'm a beginner for
Solr.

Thanks,
Dileepa


[1] https://stanbol.apache.org



Re: Need additional data processing in Data Import Handler prior to indexing

2013-10-29 Thread Michael Della Bitta
Hi Dileepa,

You can write your own Transformers in Java. If it doesn't make sense to
run Stanbol calls in a Transformer, maybe setting up a web service that
grabs a record out of MySQL, sends the data to Stanbol, and displays the
results could be used in conjunction with HttpDataSource rather than
JdbcDataSource.

http://wiki.apache.org/solr/DIHCustomTransformer
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody dileepajayak...@gmail.com
 wrote:

 Hi All,

 I'm a newbie to Solr, and I have a requirement to import data from a mysql
 database; enhance  the imported content to identify Persons mentioned  and
 index it as a separate field in Solr along with the other fields defined
 for the original db query.

 I'm using Apache Stanbol [1] for the content enhancement requirement.
 I can get enhancement results for 'Person' type data in the content as the
 enhancement result.

 The data flow will be;
 mysql-db  Solr data-import handler  Stanbol enhancer  Solr index

 For the above requirement I need to perform additional processing at the
 data-import handler prior to indexing to send a request to Stanbol and
 process the enhancement response. I found some related examples on
 modifying mysql data import handler to customize the query results in
 db-data-config.xml by using a transformer script.
 As per my requirement, In the data-import-handler I need to send a request
 to Stanbol and process the response prior to indexing. But I'm not sure if
 this can be achieved using a simple javascript.

 Is there any other better way of achieving my requirement? Maybe writing a
 custom filter in Solr?
 Please share your thoughts. Appreciate any pointers as I'm a beginner for
 Solr.

 Thanks,
 Dileepa


 [1] https://stanbol.apache.org



Re: Need additional data processing in Data Import Handler prior to indexing

2013-10-29 Thread Alexandre Rafalovitch
It's also possible to combine Update Request Processor with DIH. That way
if a debug entry needs to be inserted it could go through the same Stanbol
process.

Just define a processing chain the DIH handler and write custom URP to call
out to Stanbol web service. You have access to a full record in URP, so can
add/delete/change the fields at will.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Hi Dileepa,

 You can write your own Transformers in Java. If it doesn't make sense to
 run Stanbol calls in a Transformer, maybe setting up a web service that
 grabs a record out of MySQL, sends the data to Stanbol, and displays the
 results could be used in conjunction with HttpDataSource rather than
 JdbcDataSource.

 http://wiki.apache.org/solr/DIHCustomTransformer

 http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062  | c: +1 917 477 7906

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/


 On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody 
 dileepajayak...@gmail.com
  wrote:

  Hi All,
 
  I'm a newbie to Solr, and I have a requirement to import data from a
 mysql
  database; enhance  the imported content to identify Persons mentioned
  and
  index it as a separate field in Solr along with the other fields defined
  for the original db query.
 
  I'm using Apache Stanbol [1] for the content enhancement requirement.
  I can get enhancement results for 'Person' type data in the content as
 the
  enhancement result.
 
  The data flow will be;
  mysql-db  Solr data-import handler  Stanbol enhancer  Solr index
 
  For the above requirement I need to perform additional processing at the
  data-import handler prior to indexing to send a request to Stanbol and
  process the enhancement response. I found some related examples on
  modifying mysql data import handler to customize the query results in
  db-data-config.xml by using a transformer script.
  As per my requirement, In the data-import-handler I need to send a
 request
  to Stanbol and process the response prior to indexing. But I'm not sure
 if
  this can be achieved using a simple javascript.
 
  Is there any other better way of achieving my requirement? Maybe writing
 a
  custom filter in Solr?
  Please share your thoughts. Appreciate any pointers as I'm a beginner for
  Solr.
 
  Thanks,
  Dileepa
 
 
  [1] https://stanbol.apache.org
 



Re: Need additional data processing in Data Import Handler prior to indexing

2013-10-29 Thread Erick Erickson
Third time tonight I've been able to paste this link

Also, you can consider just moving to SolrJ and
taking DIH out of the process, see:
http://searchhub.org/2012/02/14/indexing-with-solrj/

Whichever approach fits your needs of course.

Best,
Erick


On Tue, Oct 29, 2013 at 7:15 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 It's also possible to combine Update Request Processor with DIH. That way
 if a debug entry needs to be inserted it could go through the same Stanbol
 process.

 Just define a processing chain the DIH handler and write custom URP to call
 out to Stanbol web service. You have access to a full record in URP, so can
 add/delete/change the fields at will.

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

  Hi Dileepa,
 
  You can write your own Transformers in Java. If it doesn't make sense to
  run Stanbol calls in a Transformer, maybe setting up a web service that
  grabs a record out of MySQL, sends the data to Stanbol, and displays the
  results could be used in conjunction with HttpDataSource rather than
  JdbcDataSource.
 
  http://wiki.apache.org/solr/DIHCustomTransformer
 
 
 http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062  | c: +1 917 477 7906
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  
  w: appinions.com http://www.appinions.com/
 
 
  On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody 
  dileepajayak...@gmail.com
   wrote:
 
   Hi All,
  
   I'm a newbie to Solr, and I have a requirement to import data from a
  mysql
   database; enhance  the imported content to identify Persons mentioned
   and
   index it as a separate field in Solr along with the other fields
 defined
   for the original db query.
  
   I'm using Apache Stanbol [1] for the content enhancement requirement.
   I can get enhancement results for 'Person' type data in the content as
  the
   enhancement result.
  
   The data flow will be;
   mysql-db  Solr data-import handler  Stanbol enhancer  Solr index
  
   For the above requirement I need to perform additional processing at
 the
   data-import handler prior to indexing to send a request to Stanbol and
   process the enhancement response. I found some related examples on
   modifying mysql data import handler to customize the query results in
   db-data-config.xml by using a transformer script.
   As per my requirement, In the data-import-handler I need to send a
  request
   to Stanbol and process the response prior to indexing. But I'm not sure
  if
   this can be achieved using a simple javascript.
  
   Is there any other better way of achieving my requirement? Maybe
 writing
  a
   custom filter in Solr?
   Please share your thoughts. Appreciate any pointers as I'm a beginner
 for
   Solr.
  
   Thanks,
   Dileepa
  
  
   [1] https://stanbol.apache.org
  
 



Re: Need additional data processing in Data Import Handler prior to indexing

2013-10-29 Thread Dileepa Jayakody
Thanks guys for your ideas.

I will go through them and come back with questions.

Regards,
Dileepa


On Wed, Oct 30, 2013 at 7:00 AM, Erick Erickson erickerick...@gmail.comwrote:

 Third time tonight I've been able to paste this link

 Also, you can consider just moving to SolrJ and
 taking DIH out of the process, see:
 http://searchhub.org/2012/02/14/indexing-with-solrj/

 Whichever approach fits your needs of course.

 Best,
 Erick


 On Tue, Oct 29, 2013 at 7:15 PM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

  It's also possible to combine Update Request Processor with DIH. That way
  if a debug entry needs to be inserted it could go through the same
 Stanbol
  process.
 
  Just define a processing chain the DIH handler and write custom URP to
 call
  out to Stanbol web service. You have access to a full record in URP, so
 can
  add/delete/change the fields at will.
 
  Regards,
 Alex.
 
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
  On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
   Hi Dileepa,
  
   You can write your own Transformers in Java. If it doesn't make sense
 to
   run Stanbol calls in a Transformer, maybe setting up a web service that
   grabs a record out of MySQL, sends the data to Stanbol, and displays
 the
   results could be used in conjunction with HttpDataSource rather than
   JdbcDataSource.
  
   http://wiki.apache.org/solr/DIHCustomTransformer
  
  
 
 http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource
  
   Michael Della Bitta
  
   Applications Developer
  
   o: +1 646 532 3062  | c: +1 917 477 7906
  
   appinions inc.
  
   “The Science of Influence Marketing”
  
   18 East 41st Street
  
   New York, NY 10017
  
   t: @appinions https://twitter.com/Appinions | g+:
   plus.google.com/appinions
  
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
   
   w: appinions.com http://www.appinions.com/
  
  
   On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody 
   dileepajayak...@gmail.com
wrote:
  
Hi All,
   
I'm a newbie to Solr, and I have a requirement to import data from a
   mysql
database; enhance  the imported content to identify Persons mentioned
and
index it as a separate field in Solr along with the other fields
  defined
for the original db query.
   
I'm using Apache Stanbol [1] for the content enhancement requirement.
I can get enhancement results for 'Person' type data in the content
 as
   the
enhancement result.
   
The data flow will be;
mysql-db  Solr data-import handler  Stanbol enhancer  Solr index
   
For the above requirement I need to perform additional processing at
  the
data-import handler prior to indexing to send a request to Stanbol
 and
process the enhancement response. I found some related examples on
modifying mysql data import handler to customize the query results in
db-data-config.xml by using a transformer script.
As per my requirement, In the data-import-handler I need to send a
   request
to Stanbol and process the response prior to indexing. But I'm not
 sure
   if
this can be achieved using a simple javascript.
   
Is there any other better way of achieving my requirement? Maybe
  writing
   a
custom filter in Solr?
Please share your thoughts. Appreciate any pointers as I'm a beginner
  for
Solr.
   
Thanks,
Dileepa
   
   
[1] https://stanbol.apache.org