If I were you I would ask following questions to get the answer > forget about for a minute and ask yourself how tcpip data are currently being stored - in fs/rdbmbs? > hadoop is for offiline batch processing - if you are looking for real time streaming solution - there is a storm (from linkedin) that can go well with kafka (messaging queue) or spark streaming (which is in memory map-reduce) and takes real time streams - has in built twitter api but you need to write your own service to poll data every few seconds and send it in RDD format > storm is complementary to hadoop - spark in conjuction with hadoop will allow you to do both offline and real time data analytics
On Tue, May 6, 2014 at 10:48 PM, Alex Lee <[email protected]> wrote: > Sensors' may send tcpip data to server. Each sensor may send tcpip data > like a stream to the server, the quatity of the sensors and the data rate > of the data is high. > > Firstly, how the data from tcpip can be put into hadoop. It need to do > some process and store in hbase. Does it need through save to data files > and put into hadoop or can be done in some direct ways from tcpip. Is there > any software module can take care of this. Searched that Ganglia Nagios and > Flume may do it. But when looking into details, ganglia and nagios are > more for monitoring hadoop cluster itself. Flume is for log files. > > Secondly, if the total network traffic from sensors are over the limit of > one lan port, how to share the loads, is there any component in hadoop to > make this done automatically. > > Any suggestions, thanks. >
