Solr 6.5.1 DIH setup has - somewhat broken - RSS example (redone as ATOM example in 6.6) that shows how to get stuff from https URL. You can see the atom example here: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.0/solr/example/example-DIH/solr/atom/conf/atom-data-config.xml
The main issue however is that you are not saying what format is that list of file on the server. Is that a plain list? Is it XML with files? Are you doing directory listing? Regards, Alex. ---- http://www.solr-start.com/ - Resources for Solr users, new and experienced On 12 June 2017 at 14:11, Miller, William K - Norman, OK - Contractor <william.k.mil...@usps.gov.invalid> wrote: > Thank you for your response. That is the issue that I am having. I cannot > figure out how to get the list of files from the remote server. I have tried > changing the parent Entity Processor to the XPathEntityProcessor and the > baseDir to a url using https. This did not work as it was looking for a > "foreach" attribute. Is there an Entity Processor that can be used to get > the list of files from an https source or am I going to have to use solrj or > create a custom entity processor? > > > > > ~~~~~~~~~~~~~~~~~~~~~~~ > William Kevin Miller > > ECS Federal, Inc. > USPS/MTSC > (405) 573-2158 > > > -----Original Message----- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Monday, June 12, 2017 12:57 PM > To: solr-user > Subject: Re: DIH issue with streaming xml file > > How do you get a list of URLs for the files on the remote server? That's > probably the first issue. Once you have the URLs in an outside entity or two, > you can feed them one by one into the inner entity. > > Regards, > Alex. > > ---- > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < > william.k.mil...@usps.gov.invalid> wrote: > >> I am using Solr 6.5.1 and working on importing xml files using the >> DataImportHandler. I am wanting to get the files from a remote >> server, but I am dealing with multiple xml files in multiple folders. >> I am using a nested entity in my dataConfig. Below is an example of >> how I have my dataConfig set up. I got most of this from an online >> reference. In this example I am getting the xml files from a folder >> on the Solr server, but as I mentioned above I want to get the files >> from a remote server. I have looked at the different Entity >> Processors for the DIH, but have not seen anything that seems to work. >> Is there a way to configure the below code to let me do this? >> >> >> >> >> >> <dataConfig> >> >> >> >> <dataSource name="hbk" encoding="UTF-8" >> type="FileDataSource" /> >> >> <document name="hbk"> >> >> <!-- >> >> Pickupdir fetches all files matching the filename regex in >> the supplied directory >> >> and passes them to other entities which parse the file >> contents. >> >> --> >> >> >> >> <entity >> >> name="pickupdir" >> >> processor="FileListEntityProcessor" >> >> rootEntity="false" >> >> dataSource="null" >> >> fileName="^[\w\d-]+\.xml$" >> >> baseDir="/var/solr/data/hbk/data/xml/" >> >> recursive="true" >> >> >> >> > >> >> <!-- >> >> >> Pickupxmlfile parses standard Solr update XML. >> >> --> >> >> >> >> <entity >> >> name="xml" >> >> >> pk="itemId" >> >> >> processor="XPathEntityProcessor" >> >> >> transformer="RegexTransformer,TemplateTransformer" >> >> >> datasource="pickupdir" >> >> >> stream="true" >> >> >> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl" >> >> >> url="${pickupdir.fileAbsolutePath}" >> >> >> forEach="/eflow/section | /eflow/section/item" >> >> > >> >> >> >> <field >> column="sectionId" xpath="/eflow/section/@id" commonField="true" /> >> >> <field >> column="sectionTitle" xpath="/eflow/section/@title" commonField="true" >> /> >> >> <field >> column="sectionNo" xpath="/eflow/section/@secno" commonField="true" /> >> >> <field >> column="hbkNo" xpath="/eflow/section/@hbkno" commonField="true" /> >> >> <field >> column="volumeNo" xpath="/eflow/section/@volno" commonField="true" /> >> >> >> >> <field >> column="itemId" xpath="/eflow/section/item/@id" /> >> >> <field >> column="itemTitle" xpath="/eflow/section/item/@title" /> >> >> <field >> column="itemNo" xpath="/eflow/section/item/@mit" /> >> >> <field >> column="itemFile" xpath="/eflow/section/item/@file" /> >> >> <field >> column="itemType" xpath="/eflow/section/item/@type" /> >> >> </entity> >> >> </entity> >> >> </document> >> >> </dataConfig> >> >> >> >> >> >> >> >> >> >> >> >> ~~~~~~~~~~~~~~~~~~~~~~~ >> >> William Kevin Miller >> >> [image: ecsLogo] >> >> ECS Federal, Inc. >> >> USPS/MTSC >> >> (405) 573-2158 >> >> >>