[jira] Commented: (PIG-1404) PigUnit - Pig script testing simplified.

Romain Rigaux (JIRA) Thu, 27 May 2010 18:23:03 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872751#action_12872751
 ]


Romain Rigaux commented on PIG-1404:
------------------------------------

Thank you for having a look at it Alan.     
     
Overall it is good but I have been working on a few small simplifications that 
can be useful when testing scripts with many tests and big cache archives.

In addition, it might be worth the effort to add some Web documentation about 
it.

I can update the patch next week.

> PigUnit - Pig script testing simplified. 
> -----------------------------------------
>
>                 Key: PIG-1404
>                 URL: https://issues.apache.org/jira/browse/PIG-1404
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Romain Rigaux
>            Assignee: Romain Rigaux
>             Fix For: 0.8.0
>
>         Attachments: commons-lang-2.4.jar, PIG-1404-2.patch, PIG-1404.patch
>
>
> The goal is to provide a simple xUnit framework that enables our Pig scripts 
> to be easily:
>   - unit tested
>   - regression tested
>   - quickly prototyped
> No cluster set up is required.
> For example:
> TestCase
> {code}
>   @Test
>   public void testTop3Queries() {
>     String[] args = {
>         "n=3",        
>         };
>     test = new PigTest("top_queries.pig", args);
>     String[] input = {
>         "yahoo\t10",
>         "twitter\t7",
>         "facebook\t10",
>         "yahoo\t15",
>         "facebook\t5",
>         ....
>     };
>     String[] output = {
>         "(yahoo,25L)",
>         "(facebook,15L)",
>         "(twitter,7L)",
>     };
>     test.assertOutput("data", input, "queries_limit", output);
>   }
> {code}
> top_queries.pig
> {code}
> data =
>     LOAD '$input'
>     AS (query:CHARARRAY, count:INT);
>      
>     ... 
>     
> queries_sum = 
>     FOREACH queries_group 
>     GENERATE 
>         group AS query, 
>         SUM(queries.count) AS count;
>         
>     ...
>             
> queries_limit = LIMIT queries_ordered $n;
> STORE queries_limit INTO '$output';
> {code}
> They are 3 modes:
> * LOCAL (if "pigunit.exectype.local" properties is present)
> * MAPREDUCE (use the cluster specified in the classpath, same as 
> HADOOP_CONF_DIR)
> ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in 
> the class path will be: ~/pigtest/conf)
> ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties 
> is present)
> For now, it would be nice to see how this idea could be integrated in 
> Piggybank and if PigParser/PigServer could improve their interfaces in order 
> to make PigUnit simple.
> Other components based on PigUnit could be built later:
>   - standalone MiniCluster
>   - notion of workspaces for each test
>   - standalone utility that reads test configuration and generates a test 
> report...
> It is a first prototype, open to suggestions and can definitely take 
> advantage of feedbacks.
> How to test, in pig_trunk:
> {code}
> Apply patch
> $pig_trunk ant compile-test
> $pig_trunk ant
> $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
> {code}
> (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the 
> future between 'unit' and 'integration')
> Many examples are in:
> {code}
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
> {code}
> When used as a standalone, do not forget commons-lang-2.4.jar and the 
> HADOOP_CONF_DIR to your cluster in your CLASSPATH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1404) PigUnit - Pig script testing simplified.

Reply via email to