Hello, Mr. Karl Wright: Thank you for quick response. As you mentioned, yes I am so writing my Repository Connector to access the REST api I want to use.
If I need to do more scraping than provided html-extractor, then I should write a transformer connector that works as I want. Is the statement right? And it is not good idea to do scraping in my Repository Connector, isn't it? Again, I appreciate for replying these basic questions. Sincerely, Kaya 2019年2月21日(木) 11:26 Karl Wright <[email protected]>: > Hi Kaya, > > You should be able to use the existing Solr connector to index documents > into Solr. > You will probably need to write a Repository connector to access the REST > api you describe. > If the kind of scraping you need to do can be covered by the > html-extractor transformer in its current form, then you can insert it into > the pipeline between the other two connections and you should be all set. > > Karl > > > On Wed, Feb 20, 2019 at 9:17 PM Kayak28 <[email protected]> wrote: > >> Hello, falks: >> >> I have a question about crawling and scraping in Manifold CF. >> I want to the following sequence of tasks by using MCF. >> >> 1. crawling data from RESTful api >> 2. scraping data >> 3. insert the data to Apache Solr >> >> In this case, how I need to setup Manifold CF is: >> 1. define output connector to access RESTful api (by using Web crawler >> connector or Generic connector? ) >> >> 2. define transformer connector to scrap html (by using html-extractor >> transformer connector...?) >> 3. define output connector to be Solr >> >> >> OR do I have to use other software such as Apache Nifi to control the >> sequence of these tasks? >> >> I appreciate for any comments and replays. >> >> Sincerely, >> Kaya >> >> >>
