Re: Using data-config.xml from DIH in SolrJ

Erick Erickson Thu, 14 Nov 2013 06:51:10 -0800

There's nothing that I know of that takes a DIH configuration and
uses it through SolrJ. You can use Tika directly in SolrJ if you
need to parse structured documents though, see:
http://searchhub.org/2012/02/14/indexing-with-solrj/


Yep, you're going to be kind of reinventing the wheel a bit I'm
afraid.

Best,
Erick


On Wed, Nov 13, 2013 at 1:55 PM, P Williams
<williams.tricia.l...@gmail.com>wrote:

> Hi All,
>
> I'm building a utility (Java jar) to create SolrInputDocuments and send
> them to a HttpSolrServer using the SolrJ API.  The intention is to find an
> efficient way to create documents from a large directory of files (where
> multiple files make one Solr document) and be sent to a remote Solr
> instance for update and commit.
>
> I've already solved the problem using the DataImportHandler (DIH) so I have
> a data-config.xml that describes the templated fields and cross-walking of
> the source(s) to the schema.  The original data won't always be able to be
> co-located with the Solr server which is why I'm looking for another
> option.
>
> I've also already solved the problem using ant and xslt to create a
> temporary (and unfortunately a potentially large) document which the
> UpdateHandler will accept.  I couldn't think of a solution that took
> advantage of the XSLT support in the UpdateHandler because each document is
> created from multiple files.  Our current dated Java based solution
> significantly outperforms this solution in terms of disk and time.  I've
> rejected it based on that and gone back to the drawing board.
>
> Does anyone have any suggestions on how I might be able to reuse my DIH
> configuration in the SolrJ context without re-inventing the wheel (or DIH
> in this case)?  If I'm doing something ridiculous I hope you'll point that
> out too.
>
> Thanks,
> Tricia
>

Re: Using data-config.xml from DIH in SolrJ

Reply via email to