Should I raise JIRA ?
On Wed, Mar 12, 2014 at 12:31 PM, Bikash Gupta <[email protected]>wrote: > Hi, > > Problem is not with input path, its the way Kmeans is getting executed. > Let me explain. > > I have created CSV->Sequence using map-reduce hence my data is in HDFS > After this I have run Canopy MR hence data is also in HDFS > > Now these two things are getting pushed in Kmeans MR. > > If you check KmeansDriver class, at first it tries to create cluster-0 > folder with data, here if you dont specify the scheme then it will write in > local file system. After that MR job is getting started which is expecting > cluster-0 in HDFS. > > Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR); > ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta); > ClusterClassifier prior = new ClusterClassifier(clusters, policy); > prior.writeToSeqFiles(priorClustersPath); > > if (runSequential) { > ClusterIterator.iterateSeq(conf, input, priorClustersPath, output, > maxIterations); > } else { > ClusterIterator.iterateMR(conf, input, priorClustersPath, output, > maxIterations); > } > > Let me know if I am not able to explain clearly. > > > > On Wed, Mar 12, 2014 at 11:53 AM, Sebastian Schelter <[email protected]>wrote: > >> Hi Bikash, >> >> Have you tried adding hdfs:// to your input path? Maybe that helps. >> >> --sebastian >> >> >> On 03/11/2014 11:22 AM, Bikash Gupta wrote: >> >>> Hi, >>> >>> I am running Kmeans in cluster where I am setting the configuration of >>> fs.hdfs.impl and fs.file.impl before hand as mentioned below >>> >>> conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs. >>> DistributedFileSystem.class.getName()); >>> conf.set("fs.file.impl",org.apache.hadoop.fs. >>> LocalFileSystem.class.getName()); >>> >>> Problem is that cluster-0 directory is getting created in local file >>> system >>> and cluster-1 is getting created in HDFS, and Kmeans map reduce job is >>> unable to find cluster-0 . Please see below the stacktrace >>> >>> 2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments: >>> {--clustering=null, --clusters=[/3/clusters-0-final], >>> --convergenceDelta=[0.1], >>> --distanceMeasure=[org.apache.mahout.common.distance. >>> EuclideanDistanceMeasure], >>> --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100], >>> --method=[mapreduce], --output=[/5], --overwrite=null, --startPhase=[0], >>> --tempDir=[temp]} >>> 2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load >>> native-hadoop library for your platform... using builtin-java classes >>> where >>> applicable >>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence >>> Clusters In: /3/clusters-0-final Out: /5 >>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max >>> Iterations: 100 >>> 2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser for >>> parsing the arguments. Applications should implement Tool for the same. >>> 2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input paths >>> to >>> process : 3 >>> 2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job: >>> job_201403111332_0011 >>> 2014-03-11 14:52:20 o.a.h.m.JobClient [INFO] map 0% reduce 0% >>> 2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id : >>> attempt_201403111332_0011_m_000000_0, Status : FAILED >>> 2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException: >>> /5/clusters-0 >>> at >>> org.apache.mahout.common.iterator.sequencefile. >>> SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable. >>> java:78) >>> at >>> org.apache.mahout.clustering.classify.ClusterClassifier. >>> readFromSeqFiles(ClusterClassifier.java:208) >>> at >>> org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44) >>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask. >>> java:672) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs( >>> UserGroupInformation.java:1438) >>> at org.apache.hadoop.mapred.Child.main(Child.java:262) >>> Caused by: java.io.FileNotFoundException: File /5/clusters-0 >>> >>> Please suggest!!! >>> >>> >>> >> > > > -- > Thanks & Regards > Bikash Kumar Gupta > -- Thanks & Regards Bikash Kumar Gupta
