Partitioning and setup errors

Chris MacKenzie Fri, 27 Jun 2014 05:16:34 -0700

Hi,

I realise my previous question may have been a bit naïve and I also realise
I am asking an awful lot here, any advice would be greatly appreciated.
* I have been using Hadoop 2.4 in local mode and am sticking to the
mapreduce.* side of the track.
* I am using a Custom Line reader to read each sequence into a Map
* I have a partitioner class which is testing the key from the map class.
* I've tried debugging in eclipse with a breakpoint in the partitioner class
but getPartition(LongWritable mapKey, Text sequenceString, int
numReduceTasks) is not being called.
Could there be any reason for that ?


Because my map and reduce code works in local mode within eclipse, I
wondered if I may get the partitioner to work if  I changed to Pseudo
Distributed Mode exporting a runnable jar from Eclipse (Kepler)

I have several faults On my own computer  Pseudo Distributed Mode and the
university clusters Pseudo Distributed Mode which I set up. I¹ve googled and
read extensively but am not seeing a solution to any of these issues.

I have this line:
14/06/27 11:45:27 WARN mapreduce.JobSubmitter: No job jar file set.  User
classes may not be found. See Job or Job#setJar(String).
My driver code is:
private void doParallelConcordance() throws Exception {

Path inDir = new Path("input_sequences/10_sequences.txt");

Path outDir = new Path("demo_output");



Job job = Job.getInstance(new Configuration());

job.setJarByClass(ParallelGeneticAlignment.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);



job.setInputFormatClass(CustomFileInputFormat.class);

job.setMapperClass(ConcordanceMapper.class);

job.setPartitionerClass(ConcordanceSequencePartitioner.class);

job.setReducerClass(ConcordanceReducer.class);



FileInputFormat.addInputPath(job, inDir);

FileOutputFormat.setOutputPath(job, outDir);



job.waitForCompletion(true)

}


On the university server I am getting this error:
4/06/27 11:45:40 INFO mapreduce.Job: Task Id :
attempt_1403860966764_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
par.gene.align.concordance.ConcordanceMapper not found

On my machine the error is:
4/06/27 12:58:03 INFO mapreduce.Job: Task Id :
attempt_1403864060032_0004_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
par.gene.align.concordance.ConcordanceReducer not found

On the university server I get total paths to process:
14/06/27 11:45:27 INFO input.FileInputFormat: Total input paths to process :
1
14/06/27 11:45:28 INFO mapreduce.JobSubmitter: number of splits:1

On my machine I get total paths to process:
14/06/27 12:57:09 INFO input.FileInputFormat: Total input paths to process :
0
14/06/27 12:57:36 INFO mapreduce.JobSubmitter: number of splits:0

Being new to this community, I thought it polite to introduce myself. I¹m
planning to return to software development via an MSc at Heriot Watt
University in Edinburgh. My MSc project is based on Fosters Genetic Sequence
Alignment. I have written a sequential version my goal is now to port it to
Hadoop.

Thanks in advance, 
Regards,

Chris MacKenzie

Partitioning and setup errors

Reply via email to