With time shaken loose, IMO ideally what we do (under 
https://issues.apache.org/jira/browse/SOLR-7188 
<https://issues.apache.org/jira/browse/SOLR-7188> probably) is create an update 
processor that *forwards* to a _real_ Solr collection update handler, and fire 
up EmbeddedSolrServer in a client-side command-line tool that can run 
/update/extract, DIH stuff, etc - does what it does now to extract, parse, and 
build documents and then forwards them via javabin to a live Solr collection.   
I’m not sure that SOLR-7188 currently spells it out like that, but it is a 
nice, clean, straightforward path from DIH and Tika embedded inside a real Solr 
cluster to leveraging and scaling it on its own.   We’d lose the DIH admin UI, 
but that’s ok by me.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com <http://www.lucidworks.com/>



> On Dec 15, 2015, at 9:23 AM, Davis, Daniel (NIH/NLM) [C] 
> <daniel.da...@nih.gov> wrote:
> 
> I am aware of the problems with the implementation of DIH, but is there any 
> problem with the XML driven data import capability?
> Could it be rewritten (using modern XPath) to run as a part of SolrJ?
> 
> I've been interested in that, but I just haven't been able to shake loose the 
> time.
> 
> -----Original Message-----
> From: Upayavira [mailto:u...@odoko.co.uk] 
> Sent: Tuesday, December 15, 2015 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Is DIH going to be removed from Solr future versions?
> 
> I doubt DIH will be "removed". It more likely will be relegated - still 
> there, but emphasised less.
> 
> Another possibility that has been mooted is to extract it, so that it can run 
> outside of Solr. This strikes me as the best option. Having it run inside 
> Solr strikes me as architecturally wrong, and also problematic in a SolrCloud 
> world. Taking the DIH codebase and running it
> *outside* Solr you get the best of DIH without the same set of issues.
> 
> Upayavira
> 
> On Tue, Dec 15, 2015, at 05:47 AM, Anil Cherian wrote:
>> Dear Team,
>> 
>> I use DIH extensively and even wrote my own custom transformers in 
>> some situations.
>> Recently during an architecture discussion one of my team members told 
>> that Solr is going to take away DIH from its future versions.
>> 
>> Is that true?
>> 
>> Also is using DIH for say 2 or 3 million docs a good option for 
>> indexing an XML content data set. I am planning to use it either by 
>> calling separate entities parallely or multiple /dataimport in 
>> solrconfig.xml.
>> 
>> Cld you please reply at your earliest convenience as it is an 
>> important decision for us to continue on DIH or not!
>> 
>> Thanks and Rgds,
>> Anil.

Reply via email to