Hi,

While this task is quite trivial to do with Flink Dataset API, using readTextFile to read the input and

a flatMap function to perform the downloading, it might not be a good idea.

The download process is I/O bound, and will block the synchronous flatMap function,

so the throughput will not be very good.


Until Flink supports asynchronous functions, I suggest you looks elsewhere.

An example with master-workers architecture using Akka can be found here

https://github.com/typesafehub/activator-akka-distributed-workers


Regards,

Kien



On 8/14/2017 10:09 AM, Eranga Heshan wrote:
Hi all,

I am fairly new to Flink. I have this project where I have a list of URLs (In one node) which need to be crawled distributedly. Then for each URL, I need the serialized crawled result to be written to a single text file.

I want to know if there are similar projects which I can look into or an idea on how to implement this.

Thanks & Regards,



        
Eranga Heshan
/Undergraduate/
Computer Science & Engineering
University of Moratuwa
Mobile:         +94 71 138 2686 <tel:%2B94%2071%20552%202087>
Email:  eranga....@gmail.com <mailto:eranga....@gmail.com>
<https://www.facebook.com/erangaheshan> <https://twitter.com/erangaheshan> <https://www.linkedin.com/in/erangaheshan>

Reply via email to