MAHOUT-1452 has been raised
On Wed, Mar 12, 2014 at 8:26 PM, Andrew Musselman < [email protected]> wrote: > Yes please; if you're seeing confusing behavior when you leave the hdfs > protocol off the URI then it may need some tending. > > > On Mar 12, 2014, at 7:22 AM, Bikash Gupta <[email protected]> > wrote: > > > > Should I raise JIRA ? > > > > > > On Wed, Mar 12, 2014 at 12:31 PM, Bikash Gupta <[email protected] > >wrote: > > > >> Hi, > >> > >> Problem is not with input path, its the way Kmeans is getting executed. > >> Let me explain. > >> > >> I have created CSV->Sequence using map-reduce hence my data is in HDFS > >> After this I have run Canopy MR hence data is also in HDFS > >> > >> Now these two things are getting pushed in Kmeans MR. > >> > >> If you check KmeansDriver class, at first it tries to create cluster-0 > >> folder with data, here if you dont specify the scheme then it will > write in > >> local file system. After that MR job is getting started which is > expecting > >> cluster-0 in HDFS. > >> > >> Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR); > >> ClusteringPolicy policy = new > KMeansClusteringPolicy(convergenceDelta); > >> ClusterClassifier prior = new ClusterClassifier(clusters, policy); > >> prior.writeToSeqFiles(priorClustersPath); > >> > >> if (runSequential) { > >> ClusterIterator.iterateSeq(conf, input, priorClustersPath, output, > >> maxIterations); > >> } else { > >> ClusterIterator.iterateMR(conf, input, priorClustersPath, output, > >> maxIterations); > >> } > >> > >> Let me know if I am not able to explain clearly. > >> > >> > >> > >> On Wed, Mar 12, 2014 at 11:53 AM, Sebastian Schelter <[email protected] > >wrote: > >> > >>> Hi Bikash, > >>> > >>> Have you tried adding hdfs:// to your input path? Maybe that helps. > >>> > >>> --sebastian > >>> > >>> > >>>> On 03/11/2014 11:22 AM, Bikash Gupta wrote: > >>>> > >>>> Hi, > >>>> > >>>> I am running Kmeans in cluster where I am setting the configuration of > >>>> fs.hdfs.impl and fs.file.impl before hand as mentioned below > >>>> > >>>> conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs. > >>>> DistributedFileSystem.class.getName()); > >>>> conf.set("fs.file.impl",org.apache.hadoop.fs. > >>>> LocalFileSystem.class.getName()); > >>>> > >>>> Problem is that cluster-0 directory is getting created in local file > >>>> system > >>>> and cluster-1 is getting created in HDFS, and Kmeans map reduce job is > >>>> unable to find cluster-0 . Please see below the stacktrace > >>>> > >>>> 2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments: > >>>> {--clustering=null, --clusters=[/3/clusters-0-final], > >>>> --convergenceDelta=[0.1], > >>>> --distanceMeasure=[org.apache.mahout.common.distance. > >>>> EuclideanDistanceMeasure], > >>>> --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100], > >>>> --method=[mapreduce], --output=[/5], --overwrite=null, > --startPhase=[0], > >>>> --tempDir=[temp]} > >>>> 2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load > >>>> native-hadoop library for your platform... using builtin-java classes > >>>> where > >>>> applicable > >>>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence > >>>> Clusters In: /3/clusters-0-final Out: /5 > >>>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max > >>>> Iterations: 100 > >>>> 2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser > for > >>>> parsing the arguments. Applications should implement Tool for the > same. > >>>> 2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input > paths > >>>> to > >>>> process : 3 > >>>> 2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job: > >>>> job_201403111332_0011 > >>>> 2014-03-11 14:52:20 o.a.h.m.JobClient [INFO] map 0% reduce 0% > >>>> 2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id : > >>>> attempt_201403111332_0011_m_000000_0, Status : FAILED > >>>> 2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException: > >>>> /5/clusters-0 > >>>> at > >>>> org.apache.mahout.common.iterator.sequencefile. > >>>> SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable. > >>>> java:78) > >>>> at > >>>> org.apache.mahout.clustering.classify.ClusterClassifier. > >>>> readFromSeqFiles(ClusterClassifier.java:208) > >>>> at > >>>> org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44) > >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138) > >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask. > >>>> java:672) > >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) > >>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > >>>> at java.security.AccessController.doPrivileged(Native Method) > >>>> at javax.security.auth.Subject.doAs(Subject.java:415) > >>>> at > >>>> org.apache.hadoop.security.UserGroupInformation.doAs( > >>>> UserGroupInformation.java:1438) > >>>> at org.apache.hadoop.mapred.Child.main(Child.java:262) > >>>> Caused by: java.io.FileNotFoundException: File /5/clusters-0 > >>>> > >>>> Please suggest!!! > >> > >> > >> -- > >> Thanks & Regards > >> Bikash Kumar Gupta > > > > > > > > -- > > Thanks & Regards > > Bikash Kumar Gupta > -- Thanks & Regards Bikash Kumar Gupta
