Generating individual file for each record in clustering

Kasi Subrahmanyam Mon, 10 Feb 2014 22:03:50 -0800

Hi,
I have gone through the k means clustering and canopy clustering. Here I
can see that before running clustering we need to convert the text files to
sequence files using a function called seqdirectory in mahout. For this
function the input is a directory with one file per record and filename is
record id.


But  I have more than 10 million records initially in not more than 5 to 10
files in text format in HDFS.
So now creating 10 million files as input to this seqdirectory function
doesn't seem right. I have I'd and record tab separated and 1 record per
line in my text file. So is there any other way.

Thanks,
Subbu

Generating individual file for each record in clustering

Reply via email to