Hi Kaya, You should be able to use the existing Solr connector to index documents into Solr. You will probably need to write a Repository connector to access the REST api you describe. If the kind of scraping you need to do can be covered by the html-extractor transformer in its current form, then you can insert it into the pipeline between the other two connections and you should be all set.
Karl On Wed, Feb 20, 2019 at 9:17 PM Kayak28 <[email protected]> wrote: > Hello, falks: > > I have a question about crawling and scraping in Manifold CF. > I want to the following sequence of tasks by using MCF. > > 1. crawling data from RESTful api > 2. scraping data > 3. insert the data to Apache Solr > > In this case, how I need to setup Manifold CF is: > 1. define output connector to access RESTful api (by using Web crawler > connector or Generic connector? ) > > 2. define transformer connector to scrap html (by using html-extractor > transformer connector...?) > 3. define output connector to be Solr > > > OR do I have to use other software such as Apache Nifi to control the > sequence of these tasks? > > I appreciate for any comments and replays. > > Sincerely, > Kaya > > >
