The DataImportHandler can let you fetch the file name from the
database record, and then load the file as a field and process the
text with Tika.

It will not be easy :) but it is possible.

http://wiki.apache.org/solr/DataImportHandler

On 4/17/10, Serdar Sahin <anlamar...@gmail.com> wrote:
> Hi,
>
> I am rather new to Solr and have a question.
>
> We have around 200.000 txt files which are placed into the file cloud.
> The file path is something similar to this:
>
> file/97/8f/840/fa4-1.txt
> file/a6/9d/ab0/ca2-2.txt etc.
>
> and we also store the metadata (like title, description, tags etc)
> about these files in the mysql server. So, what I want to do is to
> index title, description, tags and other data from mysql, and also get
> the txt file from file server, and link them as one record for
> searching, but I could not figure out how to automatize this process.
> I can give the path from the sql query like, Select id, title,
> description, file_path, and then solr can use this path to retrieve
> txt file, but I don't know whether is it possible or not.
>
> What is the best way to index these files with their tag title and
> description without coding in Java (Perl is ok). These txt files are
> large, between 100kb-10mb, so the last option is to store them in the
> database.
>
> Thanks,
>
> Serdar
>


-- 
Lance Norskog
goks...@gmail.com

Reply via email to