Re: Indexing Best Practice

2011-04-12 Thread Darx Oman
Hi Lance thanx for your reply, but I have a question is this patch committed to trunk?

Re: Indexing Best Practice

2011-04-11 Thread Lance Norskog
SOLR-1499 is a plug-in for the DIH that uses Solr as a DataSource. This means that you can read the database and PDFs separately. You could index all of the PDF content in one DIH script. Then, when there's a database update, you have a separate DIH scripts that reads the old row from Solr, and pul

Re: Indexing Best Practice

2011-04-11 Thread Shaun Campbell
If it's of any help I've split the processing of PDF files from the indexing. I put the PDF content into a text file (but I guess you could load it into a database) and use that as part of the indexing. My processing of the PDF files also compares timestamps on the document and the text file so th