There's nothing that I know of that takes a DIH configuration and uses it through SolrJ. You can use Tika directly in SolrJ if you need to parse structured documents though, see: http://searchhub.org/2012/02/14/indexing-with-solrj/
Yep, you're going to be kind of reinventing the wheel a bit I'm afraid. Best, Erick On Wed, Nov 13, 2013 at 1:55 PM, P Williams <williams.tricia.l...@gmail.com>wrote: > Hi All, > > I'm building a utility (Java jar) to create SolrInputDocuments and send > them to a HttpSolrServer using the SolrJ API. The intention is to find an > efficient way to create documents from a large directory of files (where > multiple files make one Solr document) and be sent to a remote Solr > instance for update and commit. > > I've already solved the problem using the DataImportHandler (DIH) so I have > a data-config.xml that describes the templated fields and cross-walking of > the source(s) to the schema. The original data won't always be able to be > co-located with the Solr server which is why I'm looking for another > option. > > I've also already solved the problem using ant and xslt to create a > temporary (and unfortunately a potentially large) document which the > UpdateHandler will accept. I couldn't think of a solution that took > advantage of the XSLT support in the UpdateHandler because each document is > created from multiple files. Our current dated Java based solution > significantly outperforms this solution in terms of disk and time. I've > rejected it based on that and gone back to the drawing board. > > Does anyone have any suggestions on how I might be able to reuse my DIH > configuration in the SolrJ context without re-inventing the wheel (or DIH > in this case)? If I'm doing something ridiculous I hope you'll point that > out too. > > Thanks, > Tricia >