Re: Problem with FileSystem in Kmeans

Bikash Gupta Wed, 12 Mar 2014 13:05:27 -0700

MAHOUT-1452 has been raised


On Wed, Mar 12, 2014 at 8:26 PM, Andrew Musselman <
[email protected]> wrote:

> Yes please; if you're seeing confusing behavior when you leave the hdfs
> protocol off the URI then it may need some tending.
>
> > On Mar 12, 2014, at 7:22 AM, Bikash Gupta <[email protected]>
> wrote:
> >
> > Should I raise JIRA ?
> >
> >
> > On Wed, Mar 12, 2014 at 12:31 PM, Bikash Gupta <[email protected]
> >wrote:
> >
> >> Hi,
> >>
> >> Problem is not with input path, its the way Kmeans is getting executed.
> >> Let me explain.
> >>
> >> I have created CSV->Sequence using map-reduce hence my data is in HDFS
> >> After this I have run Canopy MR hence data is also in HDFS
> >>
> >> Now these two things are getting pushed in Kmeans MR.
> >>
> >> If you check KmeansDriver class, at first it tries to create cluster-0
> >> folder with data, here if you dont specify the scheme then it will
> write in
> >> local file system. After that MR job is getting started which is
> expecting
> >> cluster-0 in HDFS.
> >>
> >> Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);
> >>    ClusteringPolicy policy = new
> KMeansClusteringPolicy(convergenceDelta);
> >>    ClusterClassifier prior = new ClusterClassifier(clusters, policy);
> >>    prior.writeToSeqFiles(priorClustersPath);
> >>
> >>    if (runSequential) {
> >>      ClusterIterator.iterateSeq(conf, input, priorClustersPath, output,
> >> maxIterations);
> >>    } else {
> >>      ClusterIterator.iterateMR(conf, input, priorClustersPath, output,
> >> maxIterations);
> >>    }
> >>
> >> Let me know if I am not able to explain clearly.
> >>
> >>
> >>
> >> On Wed, Mar 12, 2014 at 11:53 AM, Sebastian Schelter <[email protected]
> >wrote:
> >>
> >>> Hi Bikash,
> >>>
> >>> Have you tried adding hdfs:// to your input path? Maybe that helps.
> >>>
> >>> --sebastian
> >>>
> >>>
> >>>> On 03/11/2014 11:22 AM, Bikash Gupta wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I am running Kmeans in cluster where I am setting the configuration of
> >>>> fs.hdfs.impl and fs.file.impl before hand as mentioned below
> >>>>
> >>>> conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.
> >>>> DistributedFileSystem.class.getName());
> >>>> conf.set("fs.file.impl",org.apache.hadoop.fs.
> >>>> LocalFileSystem.class.getName());
> >>>>
> >>>> Problem is that cluster-0 directory is getting created in local file
> >>>> system
> >>>> and cluster-1 is getting created in HDFS, and Kmeans map reduce job is
> >>>> unable to find cluster-0 . Please see below the stacktrace
> >>>>
> >>>> 2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments:
> >>>> {--clustering=null, --clusters=[/3/clusters-0-final],
> >>>> --convergenceDelta=[0.1],
> >>>> --distanceMeasure=[org.apache.mahout.common.distance.
> >>>> EuclideanDistanceMeasure],
> >>>> --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100],
> >>>> --method=[mapreduce], --output=[/5], --overwrite=null,
> --startPhase=[0],
> >>>> --tempDir=[temp]}
> >>>> 2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load
> >>>> native-hadoop library for your platform... using builtin-java classes
> >>>> where
> >>>> applicable
> >>>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence
> >>>> Clusters In: /3/clusters-0-final Out: /5
> >>>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max
> >>>> Iterations: 100
> >>>> 2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser
> for
> >>>> parsing the arguments. Applications should implement Tool for the
> same.
> >>>> 2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input
> paths
> >>>> to
> >>>> process : 3
> >>>> 2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job:
> >>>> job_201403111332_0011
> >>>> 2014-03-11 14:52:20 o.a.h.m.JobClient [INFO]  map 0% reduce 0%
> >>>> 2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id :
> >>>> attempt_201403111332_0011_m_000000_0, Status : FAILED
> >>>> 2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException:
> >>>> /5/clusters-0
> >>>>         at
> >>>> org.apache.mahout.common.iterator.sequencefile.
> >>>> SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.
> >>>> java:78)
> >>>>         at
> >>>> org.apache.mahout.clustering.classify.ClusterClassifier.
> >>>> readFromSeqFiles(ClusterClassifier.java:208)
> >>>>         at
> >>>> org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
> >>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
> >>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.
> >>>> java:672)
> >>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> >>>>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >>>>         at java.security.AccessController.doPrivileged(Native Method)
> >>>>         at javax.security.auth.Subject.doAs(Subject.java:415)
> >>>>         at
> >>>> org.apache.hadoop.security.UserGroupInformation.doAs(
> >>>> UserGroupInformation.java:1438)
> >>>>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >>>> Caused by: java.io.FileNotFoundException: File /5/clusters-0
> >>>>
> >>>> Please suggest!!!
> >>
> >>
> >> --
> >> Thanks & Regards
> >> Bikash Kumar Gupta
> >
> >
> >
> > --
> > Thanks & Regards
> > Bikash Kumar Gupta
>



-- 
Thanks & Regards
Bikash Kumar Gupta

Re: Problem with FileSystem in Kmeans

Reply via email to