whoops sorry about the empty mail last time, I have one last suggestion though I'm not sure it'll work
you could try putting the path names as hdfs://testdata_seq/clusters apart from that I'm out of ideas On 29 Mar 2013, at 17:05, Cyril Bogus wrote: > Thank you again Chris. > > Yes it is a typo. > > After careful reading of the output, my program is exactly doing what you > describe. > I am trying to do everything in Hadoop fs but it is creating files on both > hadoop fs and class fs and some files are missing. When I run AND copy the > missing file from hadoop fs into the class file I get the proper output(no > errors). And I also get the proper output when I do everything within the > class file (by removing the property of conf). > > But I am trying to automate everything to run on my three node cluster for > testing within java. So I need to be able to do everything on Hadoop fs. I > will look into setting up Mahout for a proper *conf *file. > > - Cyril > > > On Fri, Mar 29, 2013 at 12:34 PM, Chris Harrington <[email protected]>wrote: > >> Well then do all the various folders exist on the hadoop fs? >> >> I also had a similar problem awhile ago where my program ran fine but then >> I did something (no idea what) and hadoop started complaining. To fix it I >> had to put everything on the hadoop fs. i.e. was move all <local fs path >> to>/data to <hadoop fs path to>data >> >> One more strange issue I ran into was where I had identically named >> folders on both local and hdfs and it was looking in the wrong one. >> >> I think that's all the causes I've run into, so if they're not the cause >> then I'm out of ideas and hopefully someone else will be able to help. >> >> also the missing colon is a typo right? hdfs//mylocation >> >> On 29 Mar 2013, at 16:09, Cyril Bogus wrote: >> >>> Thank you for the reply Chris, >>> >>> I create and write fine on the file system. And the file is there when I >>> check hadoop. So I do not think the problem is privileges. As I read it, >>> the Canopy Driver is looking for the file under the Class file >>> (/home/cyrille/DataWriter/src/testdata_seq/) instead of Hadoop's >>> (/user/cyrille/) and the file is not there so it gives me the error that >>> the file does not exists. But the file exists and was created fine >> "within >>> the program with the same conf variable" >>> >>> - Cyril >>> >>> >>> On Fri, Mar 29, 2013 at 12:01 PM, Chris Harrington <[email protected] >>> wrote: >>> >>>>> security.UserGroupInformation: >>>>> PriviledgedActionException as:cyril >>>> >>>> I'm not entirely sure but sounds like a permissions issue to me. check >> all >>>> the files are owned by the user cyril and not root. >>>> also did you start hadoop as root and run the program as cyril, hadoop >>>> might also complain about that >>>> >>>> On 29 Mar 2013, at 15:54, Cyril Bogus wrote: >>>> >>>>> Hi, >>>>> >>>>> I am running a small java program that basically write a small input >> data >>>>> to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clustering and >>>>> then output the content of the data. >>>>> >>>>> In my hadoop.properties I have included the core-site.xml definition >> for >>>>> the Java program to connect to my single node setup so that I will not >>>> use >>>>> the Java Project file system but hadoop instead (Basically all write >> and >>>>> read are done on hadoop and not in the class file). >>>>> >>>>> When I run the program, as soon as the Canopy (even the KMeans), >>>>> configuration tries to lookup for the file in the class path instead of >>>> the >>>>> Hadoop FileSystem path where the proper files are located. >>>>> >>>>> Is there a problem with the way I have my conf defined? >>>>> >>>>> hadoop.properties: >>>>> fs.default.name=hdfs//mylocation >>>>> >>>>> Program: >>>>> >>>>> public class DataFileWriter { >>>>> >>>>> private static Properties props = new Properties(); >>>>> private static Configuration conf = new Configuration(); >>>>> >>>>> /** >>>>> * @param args >>>>> * @throws ClassNotFoundException >>>>> * @throws InterruptedException >>>>> * @throws IOException >>>>> */ >>>>> public static void main(String[] args) throws IOException, >>>>> InterruptedException, ClassNotFoundException { >>>>> >>>>> props.load(new FileReader(new File( >>>>> "/home/cyril/workspace/Newer/src/hadoop.properties"))); >>>>> >>>>> // TODO Auto-generated method stub >>>>> FileSystem fs = null; >>>>> SequenceFile.Writer writer; >>>>> SequenceFile.Reader reader; >>>>> >>>>> conf.set("fs.default.name", props.getProperty("fs.default.name >>>> ")); >>>>> >>>>> List<NamedVector> vectors = new LinkedList<NamedVector>(); >>>>> NamedVector v1 = new NamedVector(new DenseVector(new double[] { >>>> 0.1, >>>>> 0.2, 0.5 }), "Hello"); >>>>> vectors.add(v1); >>>>> v1 = new NamedVector(new DenseVector(new double[] { 0.5, 0.1, 0.2 >>>>> }), >>>>> "Bored"); >>>>> vectors.add(v1); >>>>> v1 = new NamedVector(new DenseVector(new double[] { 0.2, 0.5, 0.1 >>>>> }), >>>>> "Done"); >>>>> vectors.add(v1); >>>>> // Write the data to SequenceFile >>>>> try { >>>>> fs = FileSystem.get(conf); >>>>> >>>>> Path path = new Path("testdata_seq/data"); >>>>> writer = new SequenceFile.Writer(fs, conf, path, Text.class, >>>>> VectorWritable.class); >>>>> >>>>> VectorWritable vec = new VectorWritable(); >>>>> for (NamedVector vector : vectors) { >>>>> vec.set(vector); >>>>> writer.append(new Text(vector.getName()), vec); >>>>> } >>>>> writer.close(); >>>>> >>>>> } catch (Exception e) { >>>>> System.out.println("ERROR: " + e); >>>>> } >>>>> >>>>> Path input = new Path("testdata_seq/data"); >>>>> boolean runSequential = false; >>>>> Path clustersOut = new Path("testdata_seq/clusters"); >>>>> Path clustersIn = new >>>>> Path("testdata_seq/clusters/clusters-0-final"); >>>>> double convergenceDelta = 0; >>>>> double clusterClassificationThreshold = 0; >>>>> boolean runClustering = true; >>>>> Path output = new Path("testdata_seq/output"); >>>>> int maxIterations = 12; >>>>> CanopyDriver.run(conf, input, clustersOut, new >>>>> EuclideanDistanceMeasure(), 1, 1, 1, 1, 0, runClustering, >>>>> clusterClassificationThreshold, runSequential); >>>>> KMeansDriver.run(conf, input, clustersIn, output, new >>>>> EuclideanDistanceMeasure(), convergenceDelta, maxIterations, >>>> runClustering, >>>>> clusterClassificationThreshold, runSequential); >>>>> >>>>> reader = new SequenceFile.Reader(fs, >>>>> new Path("testdata_seq/clusteredPoints/part-m-00000"), >>>>> conf); >>>>> >>>>> IntWritable key = new IntWritable(); >>>>> WeightedVectorWritable value = new WeightedVectorWritable(); >>>>> while (reader.next(key, value)) { >>>>> System.out.println(value.toString() + " belongs to cluster " >>>>> + key.toString()); >>>>> } >>>>> } >>>>> >>>>> } >>>>> >>>>> Error Output: >>>>> >>>>> ....... >>>>> 13/03/29 11:47:15 ERROR security.UserGroupInformation: >>>>> PriviledgedActionException as:cyril >>>>> cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: >> Input >>>>> path does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data >>>>> Exception in thread "main" >>>>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path >>>>> does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data >>>>> at >>>>> >>>> >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) >>>>> at >>>>> >>>> >> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55) >>>>> at >>>>> >>>> >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) >>>>> at >>>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962) >>>>> at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979) >>>>> at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) >>>>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) >>>>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:416) >>>>> at >>>>> >>>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) >>>>> at >>>>> >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) >>>>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) >>>>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) >>>>> at >>>>> >>>> >> org.apache.mahout.clustering.classify.ClusterClassificationDriver.classifyClusterMR(ClusterClassificationDriver.java:275) >>>>> at >>>>> >>>> >> org.apache.mahout.clustering.classify.ClusterClassificationDriver.run(ClusterClassificationDriver.java:135) >>>>> at >>>>> >>>> >> org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:372) >>>>> at >>>>> >>>> >> org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:158) >>>>> at DataFileWriter.main(DataFileWriter.java:85) >>>>> >>>>> >>>>> >>>>> >>>>> On another note. Is there a command that would allow the program to >>>>> overwrite existing files in the filesystem (I would get errors if I >> don't >>>>> delete the files before running the program again). >>>>> >>>>> Thank you for a reply and I hope I have given all the necessary output. >>>> In >>>>> the meantime I will look into it. >>>>> >>>>> Cyril >>>> >>>> >> >>
