Hi,
Problem is not with input path, its the way Kmeans is getting executed. Let
me explain.
I have created CSV->Sequence using map-reduce hence my data is in HDFS
After this I have run Canopy MR hence data is also in HDFS
Now these two things are getting pushed in Kmeans MR.
If you check KmeansDriver class, at first it tries to create cluster-0
folder with data, here if you dont specify the scheme then it will write in
local file system. After that MR job is getting started which is expecting
cluster-0 in HDFS.
Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);
ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta);
ClusterClassifier prior = new ClusterClassifier(clusters, policy);
prior.writeToSeqFiles(priorClustersPath);
if (runSequential) {
ClusterIterator.iterateSeq(conf, input, priorClustersPath, output,
maxIterations);
} else {
ClusterIterator.iterateMR(conf, input, priorClustersPath, output,
maxIterations);
}
Let me know if I am not able to explain clearly.
On Wed, Mar 12, 2014 at 11:53 AM, Sebastian Schelter <[email protected]> wrote:
> Hi Bikash,
>
> Have you tried adding hdfs:// to your input path? Maybe that helps.
>
> --sebastian
>
>
> On 03/11/2014 11:22 AM, Bikash Gupta wrote:
>
>> Hi,
>>
>> I am running Kmeans in cluster where I am setting the configuration of
>> fs.hdfs.impl and fs.file.impl before hand as mentioned below
>>
>> conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.
>> DistributedFileSystem.class.getName());
>> conf.set("fs.file.impl",org.apache.hadoop.fs.
>> LocalFileSystem.class.getName());
>>
>> Problem is that cluster-0 directory is getting created in local file
>> system
>> and cluster-1 is getting created in HDFS, and Kmeans map reduce job is
>> unable to find cluster-0 . Please see below the stacktrace
>>
>> 2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments:
>> {--clustering=null, --clusters=[/3/clusters-0-final],
>> --convergenceDelta=[0.1],
>> --distanceMeasure=[org.apache.mahout.common.distance.
>> EuclideanDistanceMeasure],
>> --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100],
>> --method=[mapreduce], --output=[/5], --overwrite=null, --startPhase=[0],
>> --tempDir=[temp]}
>> 2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load
>> native-hadoop library for your platform... using builtin-java classes
>> where
>> applicable
>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence
>> Clusters In: /3/clusters-0-final Out: /5
>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max
>> Iterations: 100
>> 2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input paths
>> to
>> process : 3
>> 2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job:
>> job_201403111332_0011
>> 2014-03-11 14:52:20 o.a.h.m.JobClient [INFO] map 0% reduce 0%
>> 2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id :
>> attempt_201403111332_0011_m_000000_0, Status : FAILED
>> 2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException:
>> /5/clusters-0
>> at
>> org.apache.mahout.common.iterator.sequencefile.
>> SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.
>> java:78)
>> at
>> org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(
>> ClusterClassifier.java:208)
>> at
>> org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.
>> java:672)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1438)
>> at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> Caused by: java.io.FileNotFoundException: File /5/clusters-0
>>
>> Please suggest!!!
>>
>>
>>
>
--
Thanks & Regards
Bikash Kumar Gupta