With time shaken loose, IMO ideally what we do (under https://issues.apache.org/jira/browse/SOLR-7188 <https://issues.apache.org/jira/browse/SOLR-7188> probably) is create an update processor that *forwards* to a _real_ Solr collection update handler, and fire up EmbeddedSolrServer in a client-side command-line tool that can run /update/extract, DIH stuff, etc - does what it does now to extract, parse, and build documents and then forwards them via javabin to a live Solr collection. I’m not sure that SOLR-7188 currently spells it out like that, but it is a nice, clean, straightforward path from DIH and Tika embedded inside a real Solr cluster to leveraging and scaling it on its own. We’d lose the DIH admin UI, but that’s ok by me.
— Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com <http://www.lucidworks.com/> > On Dec 15, 2015, at 9:23 AM, Davis, Daniel (NIH/NLM) [C] > <daniel.da...@nih.gov> wrote: > > I am aware of the problems with the implementation of DIH, but is there any > problem with the XML driven data import capability? > Could it be rewritten (using modern XPath) to run as a part of SolrJ? > > I've been interested in that, but I just haven't been able to shake loose the > time. > > -----Original Message----- > From: Upayavira [mailto:u...@odoko.co.uk] > Sent: Tuesday, December 15, 2015 5:04 AM > To: solr-user@lucene.apache.org > Subject: Re: Is DIH going to be removed from Solr future versions? > > I doubt DIH will be "removed". It more likely will be relegated - still > there, but emphasised less. > > Another possibility that has been mooted is to extract it, so that it can run > outside of Solr. This strikes me as the best option. Having it run inside > Solr strikes me as architecturally wrong, and also problematic in a SolrCloud > world. Taking the DIH codebase and running it > *outside* Solr you get the best of DIH without the same set of issues. > > Upayavira > > On Tue, Dec 15, 2015, at 05:47 AM, Anil Cherian wrote: >> Dear Team, >> >> I use DIH extensively and even wrote my own custom transformers in >> some situations. >> Recently during an architecture discussion one of my team members told >> that Solr is going to take away DIH from its future versions. >> >> Is that true? >> >> Also is using DIH for say 2 or 3 million docs a good option for >> indexing an XML content data set. I am planning to use it either by >> calling separate entities parallely or multiple /dataimport in >> solrconfig.xml. >> >> Cld you please reply at your earliest convenience as it is an >> important decision for us to continue on DIH or not! >> >> Thanks and Rgds, >> Anil.