Porting Pig unit tests to use MiniDFSCluster and MiniMRCluster on the local 
machine
-----------------------------------------------------------------------------------

                 Key: PIG-72
                 URL: https://issues.apache.org/jira/browse/PIG-72
             Project: Pig
          Issue Type: Test
          Components: tools
            Reporter: Xu Zhang


We have the need to port the Pig unit tests to use MiniDFSCluster and 
MiniMRCluster, so that tests can be executed with the DFS and MR threads on the 
local machine.   This feature will eliminate the need to set up a real 
distributed hadoop cluster before running the unit tests, as everything will 
now be carried out with the (mini) cluster on the user's local machine.  

One prerequisite for using this feature is a hadoop jar that has the class 
files for MiniDFSCluster, MiniMRCluster and other supporting components.  I 
have been able to generate such a jar file with a special target added by 
myself to hadoop's build.xml and have also logged a hadoop jira to request this 
target be a permanent part of that build file.  If possible, we can just 
replace hadoop15.jar with this jar file on the SVN source tree and then the 
users will never need to worry about the availability of this jar file. Please 
find such a hadoop jar file in the attachment.

To use the feature in unit tests, the user just need to call 
MiniClusterBuilder.buildCluster() before a PigServer instance is created with 
the string "mapreduce" as the parameter to its constructor.  Here is an example 
of how the MiniClusterBuilder is used in a test case class:

        public class TestWhatEver extends TestCase {
                private String initString = "mapreduce";
                private MiniClusterBuilder cluster = 
MiniClusterBuilder.buildCluster();
        
                @Test
                public void testGroupCountWithMultipleFields() throws Exception 
{
                        PigServer pig = new PigServer(initString);
                        // Do something with the pig server, such as 
registering and executing Pig 
                        // queries. The queries will executed with the local 
cluster. 
                }
     
                // More test cases if needed
        }

To run the unit tests with the local cluster, under the top directory of the 
source tree, issue the command "ant test". Notice that you do not need to 
specify the location of the hadoop-site.xml file with the command line option 
"-Djunit.hadoop.conf=<dir>" anymore. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to