Thank you for the reply Chris,

I create and write fine on the file system. And the file is there when I
check hadoop. So I do not think the problem is privileges. As I read it,
the Canopy Driver is looking for the file under the Class file
(/home/cyrille/DataWriter/src/testdata_seq/) instead of Hadoop's
(/user/cyrille/) and the file is not there so it gives me the error that
the file does not exists. But the file exists and was created fine "within
the program with the same conf variable"

- Cyril


On Fri, Mar 29, 2013 at 12:01 PM, Chris Harrington <[email protected]>wrote:

> > security.UserGroupInformation:
> > PriviledgedActionException as:cyril
>
> I'm not entirely sure but sounds like a permissions issue to me. check all
> the files are owned by the user cyril and not root.
> also did you start hadoop as root and run the program as cyril, hadoop
> might also complain about that
>
> On 29 Mar 2013, at 15:54, Cyril Bogus wrote:
>
> > Hi,
> >
> > I am running a small java program that basically write a small input data
> > to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clustering and
> > then output the content of the data.
> >
> > In my hadoop.properties I have included the core-site.xml definition for
> > the Java program to connect to my single node setup so that I will not
> use
> > the Java Project file system but hadoop instead (Basically all write and
> > read are done on hadoop and not in the class file).
> >
> > When I run the program, as soon as the Canopy (even the KMeans),
> > configuration tries to lookup for the file in the class path instead of
> the
> > Hadoop FileSystem path where the proper files are located.
> >
> > Is there a problem with the way I have my conf defined?
> >
> > hadoop.properties:
> > fs.default.name=hdfs//mylocation
> >
> > Program:
> >
> > public class DataFileWriter {
> >
> >    private static Properties props = new Properties();
> >    private static Configuration conf = new Configuration();
> >
> >    /**
> >     * @param args
> >     * @throws ClassNotFoundException
> >     * @throws InterruptedException
> >     * @throws IOException
> >     */
> >    public static void main(String[] args) throws IOException,
> >            InterruptedException, ClassNotFoundException {
> >
> >        props.load(new FileReader(new File(
> >                "/home/cyril/workspace/Newer/src/hadoop.properties")));
> >
> >        // TODO Auto-generated method stub
> >        FileSystem fs = null;
> >        SequenceFile.Writer writer;
> >        SequenceFile.Reader reader;
> >
> >        conf.set("fs.default.name", props.getProperty("fs.default.name
> "));
> >
> >        List<NamedVector> vectors = new LinkedList<NamedVector>();
> >        NamedVector v1 = new NamedVector(new DenseVector(new double[] {
> 0.1,
> >                0.2, 0.5 }), "Hello");
> >        vectors.add(v1);
> >        v1 = new NamedVector(new DenseVector(new double[] { 0.5, 0.1, 0.2
> > }),
> >                "Bored");
> >        vectors.add(v1);
> >        v1 = new NamedVector(new DenseVector(new double[] { 0.2, 0.5, 0.1
> > }),
> >                "Done");
> >        vectors.add(v1);
> >        // Write the data to SequenceFile
> >        try {
> >            fs = FileSystem.get(conf);
> >
> >            Path path = new Path("testdata_seq/data");
> >            writer = new SequenceFile.Writer(fs, conf, path, Text.class,
> >                    VectorWritable.class);
> >
> >            VectorWritable vec = new VectorWritable();
> >            for (NamedVector vector : vectors) {
> >                vec.set(vector);
> >                writer.append(new Text(vector.getName()), vec);
> >            }
> >            writer.close();
> >
> >        } catch (Exception e) {
> >            System.out.println("ERROR: " + e);
> >        }
> >
> >        Path input = new Path("testdata_seq/data");
> >        boolean runSequential = false;
> >        Path clustersOut = new Path("testdata_seq/clusters");
> >        Path clustersIn = new
> > Path("testdata_seq/clusters/clusters-0-final");
> >        double convergenceDelta = 0;
> >        double clusterClassificationThreshold = 0;
> >        boolean runClustering = true;
> >        Path output = new Path("testdata_seq/output");
> >        int maxIterations = 12;
> >        CanopyDriver.run(conf, input, clustersOut, new
> > EuclideanDistanceMeasure(), 1, 1, 1, 1, 0, runClustering,
> > clusterClassificationThreshold, runSequential);
> >        KMeansDriver.run(conf, input, clustersIn, output, new
> > EuclideanDistanceMeasure(), convergenceDelta, maxIterations,
> runClustering,
> > clusterClassificationThreshold, runSequential);
> >
> >        reader = new SequenceFile.Reader(fs,
> >                new Path("testdata_seq/clusteredPoints/part-m-00000"),
> > conf);
> >
> >        IntWritable key = new IntWritable();
> >        WeightedVectorWritable value = new WeightedVectorWritable();
> >        while (reader.next(key, value)) {
> >          System.out.println(value.toString() + " belongs to cluster "
> >                             + key.toString());
> >        }
> >    }
> >
> > }
> >
> > Error Output:
> >
> > .......
> > 13/03/29 11:47:15 ERROR security.UserGroupInformation:
> > PriviledgedActionException as:cyril
> > cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> > path does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data
> > Exception in thread "main"
> > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> > does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data
> >    at
> >
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
> >    at
> >
> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
> >    at
> >
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
> >    at
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
> >    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
> >    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
> >    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
> >    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:416)
> >    at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> >    at
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
> >    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
> >    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
> >    at
> >
> org.apache.mahout.clustering.classify.ClusterClassificationDriver.classifyClusterMR(ClusterClassificationDriver.java:275)
> >    at
> >
> org.apache.mahout.clustering.classify.ClusterClassificationDriver.run(ClusterClassificationDriver.java:135)
> >    at
> >
> org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:372)
> >    at
> >
> org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:158)
> >    at DataFileWriter.main(DataFileWriter.java:85)
> >
> >
> >
> >
> > On another note. Is there a command that would allow the program to
> > overwrite existing files in the filesystem (I would get errors if I don't
> > delete the files before running the program again).
> >
> > Thank you for a reply and I hope I have given all the necessary output.
> In
> > the meantime I will look into it.
> >
> > Cyril
>
>

Reply via email to