Re: Fw: Mahout dataset Vectorization

2015-03-26 Thread Ted Dunning
Raghuveer, I am more confused than before. You say that the destination is on the second line. That seems to imply that your data has more than one line per data point. Is this so? That seems to contradict your previous comments. On Wed, Mar 25, 2015 at 10:20 PM, Raghuveer wrote: > Thanks

Re: Fw: Mahout dataset Vectorization

2015-03-25 Thread Raghuveer
Thanks for the reply. IPs means: ip, timestamp, bytes_transferred192.168.5.34, 1345456765434, 456192.168.4.24, 1345456765444, 34192.168.5.34, 1345456765454, 2355... Yesi have a list of IP addresses and i have extracted data from binary files and loaded them to HDFS in text format.  Destination I

Re: Fw: Mahout dataset Vectorization

2015-03-25 Thread Ted Dunning
This is an old question that I just dredged up in my email. There is still a question about your format here. When you say "IPs" do you mean that you have a list of IP addresses? Or is this a server web-log? Does that mean that the destination IP is implicit. If so, you might be able to see a