Furthermore, I did another test: rename TeraSort#TotalOrderPartitioner to TeraSort#MyOwnTotalOrderPartitioner to avoid conflicting with other homonymic classes in hadoop classpath. Also, in TeraSort.java, I modified 'job.setPartitionerClass(TotalOrderPartitioner.class);' to 'job.setPartitionerClass(MyOwnTotalOrderPartitioner.class);'. However, seems the MyOwnTotalOrderPartitioner was not invoked during executing terasort job.
BTW, in TeraSort#TotalOrderPartitioner#readPartitions(), there is a statement 'DataInputStream reader = fs.open(p);', and I know 'p' is the path of '_partition.lst'. But I am not clear two details: - Where is the location of 'p'? It's on hdfs or Linux file system? What's its absolute path? - Which part or phase of Hadoop MapReduce copy the _partition.lst file to the path 'p'? I am very confusing this part Thanks very much! 2013/10/20 sam liu <[email protected]> > After I took following actions, the job still could pass and seems all > TotalOrderPartitioner classes were not invoked at all: > - Modified libexec/hadoop-config.sh to put > hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath, > and it should ensure the TeraSort# > TotalOrderPartitioner will be invoked first > - Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then > replace with the new generated > share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar > > > 2013/10/19 Arun C Murthy <[email protected]> > >> Apologies for the late response. >> >> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not >> org.apache.hadoop.mapred). >> >> Did you fiddle with the right TotalOrderPartitioner >> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner? >> >> Arun >> >> On Oct 17, 2013, at 8:12 PM, sam liu <[email protected]> wrote: >> >> It's really weird and confusing me. Anyone can help this question? >> >> Thanks! >> >> >> 2013/10/16 sam liu <[email protected]> >> >>> Hi Experts, >>> >>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as >>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. >>> However, seems Yarn did not execute the methods of >>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as >>> below: >>> >>> Test 1: Add some code in the method readPartitions() and setConf() in >>> TeraSort#TotalOrderPartitioner to print some words and write some word to a >>> file. >>> Expected Result: Some words should be printed and wrote into a file >>> Actual Result: No word was printed and wrote into a file at all >>> >>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, >>> but only remaining some necessary but empty methods in it >>> Expected Result: TeraSort job will ocurr some exception, as the >>> specified Partitioner is not implemented at all >>> Actual Result: TeraSort job completed successfully without any exception >>> >>> Above tests confused me a lot, because seems Yarn never use specified >>> partitioner TeraSort#TotalOrderPartitioner at all during job execution. >>> >>> Any one can help provide the reasons? >>> >>> Thanks very much! >>> >> >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. > > >
