Hi Kaya,

You should be able to use the existing Solr connector to index documents
into Solr.
You will probably need to write a Repository connector to access the REST
api you describe.
If the kind of scraping you need to do can be covered by the html-extractor
transformer in its current form, then you can insert it into the pipeline
between the other two connections and you should be all set.

Karl


On Wed, Feb 20, 2019 at 9:17 PM Kayak28 <[email protected]> wrote:

> Hello, falks:
>
> I have a question about crawling and scraping in Manifold CF.
> I want to the following sequence of tasks by using MCF.
>
> 1. crawling data from RESTful api
> 2. scraping data
> 3. insert the data to Apache Solr
>
> In this case, how I need to setup Manifold CF is:
> 1. define output connector to access RESTful api (by using Web crawler
> connector or Generic connector? )
>
> 2. define transformer connector to scrap html (by using html-extractor
> transformer connector...?)
> 3. define output connector to be Solr
>
>
> OR do I have to use other software such as Apache Nifi to control the
> sequence of these tasks?
>
> I appreciate for any comments and replays.
>
> Sincerely,
> Kaya
>
>
>

Reply via email to