Solr 6.5.1 DIH setup has - somewhat broken - RSS example (redone as
ATOM example in 6.6) that shows how to get stuff from https URL. You
can see the atom example here:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.0/solr/example/example-DIH/solr/atom/conf/atom-data-config.xml


The main issue however is that you are not saying what format is that
list of file on the server. Is that a plain list? Is it XML with
files? Are you doing directory listing?

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 12 June 2017 at 14:11, Miller, William K - Norman, OK - Contractor
<william.k.mil...@usps.gov.invalid> wrote:
> Thank you for your response.  That is the issue that I am having.  I cannot 
> figure out how to get the list of files from the remote server.  I have tried 
> changing the parent Entity Processor to the XPathEntityProcessor and the 
> baseDir to a url using https.  This did not work as it was looking for a 
> "foreach" attribute.  Is there an Entity Processor that can be used to get 
> the list of files from an https source or am I going to have to use solrj or 
> create a custom entity processor?
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Monday, June 12, 2017 12:57 PM
> To: solr-user
> Subject: Re: DIH issue with streaming xml file
>
> How do you get a list of URLs for the files on the remote server? That's 
> probably the first issue. Once you have the URLs in an outside entity or two, 
> you can feed them one by one into the inner entity.
>
> Regards,
>    Alex.
>
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
> On 12 June 2017 at 09:39, Miller, William K - Norman, OK - Contractor < 
> william.k.mil...@usps.gov.invalid> wrote:
>
>> I am using Solr 6.5.1 and working on importing xml files using the
>> DataImportHandler.  I am wanting to get the files from a remote
>> server, but I am dealing with multiple xml files in multiple folders.
>> I am using a nested entity in my dataConfig.  Below is an example of
>> how I have my dataConfig set up.  I got most of this from an online
>> reference.  In this example I am getting the xml files from a folder
>> on the Solr server, but as I mentioned above I want to get the files
>> from a remote server.  I have looked at the different Entity
>> Processors for the DIH, but have not seen anything that seems to work.
>> Is there a way to configure the below code to let me do this?
>>
>>
>>
>>
>>
>> <dataConfig>
>>
>>
>>
>>                 <dataSource name="hbk" encoding="UTF-8"
>> type="FileDataSource" />
>>
>>                 <document name="hbk">
>>
>>                                 <!--
>>
>>             Pickupdir fetches all files matching the filename regex in
>> the supplied directory
>>
>>             and passes them to other entities which parse the file
>> contents.
>>
>>         -->
>>
>>
>>
>>                                 <entity
>>
>>             name="pickupdir"
>>
>>             processor="FileListEntityProcessor"
>>
>>             rootEntity="false"
>>
>>             dataSource="null"
>>
>>             fileName="^[\w\d-]+\.xml$"
>>
>>             baseDir="/var/solr/data/hbk/data/xml/"
>>
>>             recursive="true"
>>
>>
>>
>>         >
>>
>>                                                 <!--
>>
>>
>> Pickupxmlfile parses standard Solr update XML.
>>
>>                                                 -->
>>
>>
>>
>>                                                 <entity
>>
>>                                                                 name="xml"
>>
>>
>> pk="itemId"
>>
>>
>> processor="XPathEntityProcessor"
>>
>>
>> transformer="RegexTransformer,TemplateTransformer"
>>
>>
>> datasource="pickupdir"
>>
>>
>> stream="true"
>>
>>
>> xsl="/var/solr/data/hbk/data/xsl/solr_timdex.xsl"
>>
>>
>> url="${pickupdir.fileAbsolutePath}"
>>
>>
>> forEach="/eflow/section | /eflow/section/item"
>>
>>                                                 >
>>
>>
>>
>>                                                                 <field
>> column="sectionId" xpath="/eflow/section/@id" commonField="true" />
>>
>>                                                                 <field
>> column="sectionTitle" xpath="/eflow/section/@title" commonField="true"
>> />
>>
>>                                                                 <field
>> column="sectionNo" xpath="/eflow/section/@secno" commonField="true" />
>>
>>                                                                 <field
>> column="hbkNo" xpath="/eflow/section/@hbkno" commonField="true" />
>>
>>                                                                 <field
>> column="volumeNo" xpath="/eflow/section/@volno" commonField="true" />
>>
>>
>>
>>                                                                 <field
>> column="itemId" xpath="/eflow/section/item/@id" />
>>
>>                                                                 <field
>> column="itemTitle" xpath="/eflow/section/item/@title" />
>>
>>                                                                 <field
>> column="itemNo" xpath="/eflow/section/item/@mit" />
>>
>>                                                                 <field
>> column="itemFile" xpath="/eflow/section/item/@file" />
>>
>>                                                                 <field
>> column="itemType" xpath="/eflow/section/item/@type" />
>>
>>                                                 </entity>
>>
>>                                 </entity>
>>
>>                 </document>
>>
>> </dataConfig>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~
>>
>> William Kevin Miller
>>
>> [image: ecsLogo]
>>
>> ECS Federal, Inc.
>>
>> USPS/MTSC
>>
>> (405) 573-2158
>>
>>
>>

Reply via email to