[ 
https://issues.apache.org/jira/browse/PIG-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570947#action_12570947
 ] 

Xu Zhang commented on PIG-72:
-----------------------------

*We might need to make a choice here.*  

I tried with my previous implementation of the unit test framework again.  It 
does not set up the mini clusters with setUp() and then shut them down with 
tearDown() for each test method.  Instead, it uses a singleton that exists for 
the duration of the execution of a testcase class (which BTW is more like a 
real cluster that always physically exists:-)).  Here is an example of the 
usage of the singleton:

{code}
public class TestWhatEver extends TestCase {
    private String initString = "mapreduce";
    private MiniClusterBuilder cluster = MiniClusterBuilder.buildCluster();

    @Test
    public void testCase1() throws Exception { 
        PigServer pig = new PigServer(initString); 

        // Do something with the pig server, such as registering and executing 
Pig 
        // queries. The queries will executed with the local cluster. 
    }

    @Test
    public void testCase2() throws Exception { 
        PigServer pig = new PigServer(initString); 

        // Do something with the pig server, such as registering and executing 
Pig 
        // queries. The queries will executed with the local cluster. 
    }

    // More test cases if needed
}
{code}

With this implementation, all present Pig unit tests run successfully without 
any error and the total execution time is around 11 minutes on my machine.  

So I would like your opinion on which implementation to use.  The major concern 
that people had with the previous implementation is that it uses finalize() to 
shut down the dfs and mapreduce clusters.  But because Java guarantees that all 
finalizers are run on leftover objects when the Java virtual machine exits, the 
finalize() method as used in this implementation should not be an issue.   I am 
saying this, because as far as I understand, each Junit test case class is 
executed in a separate jvm.  So it is to our advantage (such as the efficiency, 
running the tests on a local cluster that is more realistic, and less chance 
for race conditions)  to start the cluster when the test case class is loaded 
and then shut it down when the jvm for the Junit test case class exits.  

FWIW, from the test reports of this implementation it is verified that the 
cluster is set up only once for each test case class.  It is also verified that 
all test case classes use the same set of resources (such as ports) for the dfs 
and mapreduce clusters, which means they are shut down cleanly for each test 
case class.

Thoughts?

> Porting Pig unit tests to use MiniDFSCluster and MiniMRCluster on the local 
> machine
> -----------------------------------------------------------------------------------
>
>                 Key: PIG-72
>                 URL: https://issues.apache.org/jira/browse/PIG-72
>             Project: Pig
>          Issue Type: Test
>          Components: tools
>            Reporter: Xu Zhang
>         Attachments: hadoop-0.15.3-dev-test-utils.jar, 
> PortPigUnitTestToMiniClusters.patch, 
> TEST-org.apache.pig.test.TestAlgebraicEval.txt
>
>
> We have the need to port the Pig unit tests to use MiniDFSCluster and 
> MiniMRCluster, so that tests can be executed with the DFS and MR threads on 
> the local machine.   This feature will eliminate the need to set up a real 
> distributed hadoop cluster before running the unit tests, as everything will 
> now be carried out with the (mini) cluster on the user's local machine.  
> One prerequisite for using this feature is a hadoop jar that has the class 
> files for MiniDFSCluster, MiniMRCluster and other supporting components.  I 
> have been able to generate such a jar file with a special target added by 
> myself to hadoop's build.xml and have also logged a hadoop jira to request 
> this target be a permanent part of that build file.  If possible, we can just 
> replace hadoop15.jar with this jar file on the SVN source tree and then the 
> users will never need to worry about the availability of this jar file. 
> Please find such a hadoop jar file in the attachment.
> To use the feature in unit tests, the user just need to call 
> MiniClusterBuilder.buildCluster() before a PigServer instance is created with 
> the string "mapreduce" as the parameter to its constructor.  Here is an 
> example of how the MiniClusterBuilder is used in a test case class:
>         public class TestWhatEver extends TestCase {
>               private String initString = "mapreduce";
>               private MiniClusterBuilder cluster = 
> MiniClusterBuilder.buildCluster();
>       
>                 @Test
>                 public void testGroupCountWithMultipleFields() throws 
> Exception {
>                         PigServer pig = new PigServer(initString);
>                         // Do something with the pig server, such as 
> registering and executing Pig 
>                         // queries. The queries will executed with the local 
> cluster. 
>                 }
>      
>                 // More test cases if needed
>         }
> To run the unit tests with the local cluster, under the top directory of the 
> source tree, issue the command "ant test". Notice that you do not need to 
> specify the location of the hadoop-site.xml file with the command line option 
> "-Djunit.hadoop.conf=<dir>" anymore. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to