Hi,
I have 50 webservers that are pushing data at 500Mbits/sec via Flume to HDFS (i) What is the minimum virtual memory required on the websevers and NameNode (assuming this is a direct sync to HDFS and no Collector involved) (ii) In the second case, lets assume that there is a Flume Collector that is sitting in between the webservers and HDFS Cluster and instead of direct RPC connection from the webservers to HDFS cluster, the flume collector receives the packets and then transits it to HDFS – what kind of virtual memory and hardware specification required on the Flume Collector, Webserver and the NameNode (iii) can webserver push traffic accross WAN to a remote HDFS cluster seperate by RTT factor 150ms without Flume Collector? I will appreciate if you can get me this info as earliest as possible. Asim
