Hi,
I have gone through the k means clustering and canopy clustering. Here I
can see that before running clustering we need to convert the text files to
sequence files using a function called seqdirectory in mahout. For this
function the input is a directory with one file per record and filename is
record id.

But  I have more than 10 million records initially in not more than 5 to 10
files in text format in HDFS.
So now creating 10 million files as input to this seqdirectory function
doesn't seem right. I have I'd and record tab separated and 1 record per
line in my text file. So is there any other way.

Thanks,
Subbu

Reply via email to