Hello, falks:

I have a question about crawling and scraping in Manifold CF.
I want to the following sequence of tasks by using MCF.

1. crawling data from RESTful api
2. scraping data
3. insert the data to Apache Solr

In this case, how I need to setup Manifold CF is:
1. define output connector to access RESTful api (by using Web crawler
connector or Generic connector? )

2. define transformer connector to scrap html (by using html-extractor
transformer connector...?)
3. define output connector to be Solr


OR do I have to use other software such as Apache Nifi to control the
sequence of these tasks?

I appreciate for any comments and replays.

Sincerely,
Kaya
  • [no subject] Kayak28
    • Re: Karl Wright

Reply via email to