Hi,
I realise my previous question may have been a bit naïve and I also realise
I am asking an awful lot here, any advice would be greatly appreciated.
* I have been using Hadoop 2.4 in local mode and am sticking to the
mapreduce.* side of the track.
* I am using a Custom Line reader to read each sequence into a Map
* I have a partitioner class which is testing the key from the map class.
* I've tried debugging in eclipse with a breakpoint in the partitioner class
but getPartition(LongWritable mapKey, Text sequenceString, int
numReduceTasks) is not being called.
Could there be any reason for that ?
Because my map and reduce code works in local mode within eclipse, I
wondered if I may get the partitioner to work if I changed to Pseudo
Distributed Mode exporting a runnable jar from Eclipse (Kepler)
I have several faults On my own computer Pseudo Distributed Mode and the
university clusters Pseudo Distributed Mode which I set up. I¹ve googled and
read extensively but am not seeing a solution to any of these issues.
I have this line:
14/06/27 11:45:27 WARN mapreduce.JobSubmitter: No job jar file set. User
classes may not be found. See Job or Job#setJar(String).
My driver code is:
private void doParallelConcordance() throws Exception {
Path inDir = new Path("input_sequences/10_sequences.txt");
Path outDir = new Path("demo_output");
Job job = Job.getInstance(new Configuration());
job.setJarByClass(ParallelGeneticAlignment.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(CustomFileInputFormat.class);
job.setMapperClass(ConcordanceMapper.class);
job.setPartitionerClass(ConcordanceSequencePartitioner.class);
job.setReducerClass(ConcordanceReducer.class);
FileInputFormat.addInputPath(job, inDir);
FileOutputFormat.setOutputPath(job, outDir);
job.waitForCompletion(true)
}
On the university server I am getting this error:
4/06/27 11:45:40 INFO mapreduce.Job: Task Id :
attempt_1403860966764_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
par.gene.align.concordance.ConcordanceMapper not found
On my machine the error is:
4/06/27 12:58:03 INFO mapreduce.Job: Task Id :
attempt_1403864060032_0004_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
par.gene.align.concordance.ConcordanceReducer not found
On the university server I get total paths to process:
14/06/27 11:45:27 INFO input.FileInputFormat: Total input paths to process :
1
14/06/27 11:45:28 INFO mapreduce.JobSubmitter: number of splits:1
On my machine I get total paths to process:
14/06/27 12:57:09 INFO input.FileInputFormat: Total input paths to process :
0
14/06/27 12:57:36 INFO mapreduce.JobSubmitter: number of splits:0
Being new to this community, I thought it polite to introduce myself. I¹m
planning to return to software development via an MSc at Heriot Watt
University in Edinburgh. My MSc project is based on Fosters Genetic Sequence
Alignment. I have written a sequential version my goal is now to port it to
Hadoop.
Thanks in advance,
Regards,
Chris MacKenzie