Re: Indexing data from multiple datasources

Erick Erickson Thu, 09 Jun 2011 09:15:37 -0700

Hmmm, when you say you use Tika, are you using some custom Java code? Because
if you are, the best thing to do is query your database at that point
and add whatever information
you need to the document.


If you're using DIH to do the crawl, consider implementing a
Transformer to do the database
querying and modify the document as necessary.... This is pretty
simple to do, we can
chat a bit more depending on whether either approach makes sense.

Best
Erick



On Thu, Jun 9, 2011 at 10:43 AM, Greg Georges <greg.geor...@biztree.com> wrote:
> Hello all,
>
> I have checked the forums to see if it is possible to create and index from 
> multiple datasources. I have found references to SOLR-1358, but I don't think 
> this fits my scenario. In all, we have an application where we upload files. 
> On the file upload, I use the Tika extract handler to save metadata from the 
> file (_attr, literal values, etc..). We also have a database which has 
> information on the uploaded files, like the category, type, etc.. I would 
> like to update the index to include this information from the db in the index 
> for each document. If I run a dataimporthandler after the extract phase I am 
> afraid that by updating the doc in the index by its id will just cause that I 
> overwrite the old information with the info from the DB (what I understand is 
> that Solr updates its index by ID by deleting first then recreating the info).
>
> Anyone have any pointers, is there a clean way to do this, or must I find a 
> way to pass the db metadata to the extract handler and save it as literal 
> fields?
>
> Thanks in advance
>
> Greg
>

Re: Indexing data from multiple datasources

Reply via email to