Re: Problem with FileSystem in Kmeans

Andrew Musselman Wed, 12 Mar 2014 07:58:16 -0700

Yes please; if you're seeing confusing behavior when you leave the hdfs 
protocol off the URI then it may need some tending.


> On Mar 12, 2014, at 7:22 AM, Bikash Gupta <[email protected]> wrote:
> 
> Should I raise JIRA ?
> 
> 
> On Wed, Mar 12, 2014 at 12:31 PM, Bikash Gupta 
> <[email protected]>wrote:
> 
>> Hi,
>> 
>> Problem is not with input path, its the way Kmeans is getting executed.
>> Let me explain.
>> 
>> I have created CSV->Sequence using map-reduce hence my data is in HDFS
>> After this I have run Canopy MR hence data is also in HDFS
>> 
>> Now these two things are getting pushed in Kmeans MR.
>> 
>> If you check KmeansDriver class, at first it tries to create cluster-0
>> folder with data, here if you dont specify the scheme then it will write in
>> local file system. After that MR job is getting started which is expecting
>> cluster-0 in HDFS.
>> 
>> Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);
>>    ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta);
>>    ClusterClassifier prior = new ClusterClassifier(clusters, policy);
>>    prior.writeToSeqFiles(priorClustersPath);
>> 
>>    if (runSequential) {
>>      ClusterIterator.iterateSeq(conf, input, priorClustersPath, output,
>> maxIterations);
>>    } else {
>>      ClusterIterator.iterateMR(conf, input, priorClustersPath, output,
>> maxIterations);
>>    }
>> 
>> Let me know if I am not able to explain clearly.
>> 
>> 
>> 
>> On Wed, Mar 12, 2014 at 11:53 AM, Sebastian Schelter <[email protected]>wrote:
>> 
>>> Hi Bikash,
>>> 
>>> Have you tried adding hdfs:// to your input path? Maybe that helps.
>>> 
>>> --sebastian
>>> 
>>> 
>>>> On 03/11/2014 11:22 AM, Bikash Gupta wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am running Kmeans in cluster where I am setting the configuration of
>>>> fs.hdfs.impl and fs.file.impl before hand as mentioned below
>>>> 
>>>> conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.
>>>> DistributedFileSystem.class.getName());
>>>> conf.set("fs.file.impl",org.apache.hadoop.fs.
>>>> LocalFileSystem.class.getName());
>>>> 
>>>> Problem is that cluster-0 directory is getting created in local file
>>>> system
>>>> and cluster-1 is getting created in HDFS, and Kmeans map reduce job is
>>>> unable to find cluster-0 . Please see below the stacktrace
>>>> 
>>>> 2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments:
>>>> {--clustering=null, --clusters=[/3/clusters-0-final],
>>>> --convergenceDelta=[0.1],
>>>> --distanceMeasure=[org.apache.mahout.common.distance.
>>>> EuclideanDistanceMeasure],
>>>> --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100],
>>>> --method=[mapreduce], --output=[/5], --overwrite=null, --startPhase=[0],
>>>> --tempDir=[temp]}
>>>> 2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load
>>>> native-hadoop library for your platform... using builtin-java classes
>>>> where
>>>> applicable
>>>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence
>>>> Clusters In: /3/clusters-0-final Out: /5
>>>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max
>>>> Iterations: 100
>>>> 2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input paths
>>>> to
>>>> process : 3
>>>> 2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job:
>>>> job_201403111332_0011
>>>> 2014-03-11 14:52:20 o.a.h.m.JobClient [INFO]  map 0% reduce 0%
>>>> 2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id :
>>>> attempt_201403111332_0011_m_000000_0, Status : FAILED
>>>> 2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException:
>>>> /5/clusters-0
>>>>         at
>>>> org.apache.mahout.common.iterator.sequencefile.
>>>> SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.
>>>> java:78)
>>>>         at
>>>> org.apache.mahout.clustering.classify.ClusterClassifier.
>>>> readFromSeqFiles(ClusterClassifier.java:208)
>>>>         at
>>>> org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
>>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.
>>>> java:672)
>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>>>>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>         at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(
>>>> UserGroupInformation.java:1438)
>>>>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>>> Caused by: java.io.FileNotFoundException: File /5/clusters-0
>>>> 
>>>> Please suggest!!!
>> 
>> 
>> --
>> Thanks & Regards
>> Bikash Kumar Gupta
> 
> 
> 
> -- 
> Thanks & Regards
> Bikash Kumar Gupta

Re: Problem with FileSystem in Kmeans

Reply via email to