The DataImportHandler can let you fetch the file name from the database record, and then load the file as a field and process the text with Tika.
It will not be easy :) but it is possible. http://wiki.apache.org/solr/DataImportHandler On 4/17/10, Serdar Sahin <anlamar...@gmail.com> wrote: > Hi, > > I am rather new to Solr and have a question. > > We have around 200.000 txt files which are placed into the file cloud. > The file path is something similar to this: > > file/97/8f/840/fa4-1.txt > file/a6/9d/ab0/ca2-2.txt etc. > > and we also store the metadata (like title, description, tags etc) > about these files in the mysql server. So, what I want to do is to > index title, description, tags and other data from mysql, and also get > the txt file from file server, and link them as one record for > searching, but I could not figure out how to automatize this process. > I can give the path from the sql query like, Select id, title, > description, file_path, and then solr can use this path to retrieve > txt file, but I don't know whether is it possible or not. > > What is the best way to index these files with their tag title and > description without coding in Java (Perl is ok). These txt files are > large, between 100kb-10mb, so the last option is to store them in the > database. > > Thanks, > > Serdar > -- Lance Norskog goks...@gmail.com