I was finally able to write unit test, something like this:
It seem to work so I think the way I understood these records is probably
correct.
public class OUTPUTTest {
private static final Logger log = Logger.getLogger(OUTPUTTest.class);
TupleFactory mTupleFactory = TupleFactory.getInstance();
BagFactory mBagFactory = BagFactory.getInstance();
@Test
public void evalFuncTest() throws IOException {
String record = "a b
{(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X
Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}";
String records[][] = {
{ "a" },
{ "b" },
{ "ST:NC", "ZIP:28613", "CITY:Xxxxxxx",
"NAM2:Xxxxx X &xxx; Xxxxx X Xxxxxx" },
{ "OCCUP:xxxxxxx xxxxx", "AGE:55", "MARITAL:Married" } };
Tuple t = mTupleFactory.newTuple(4);
loadTuple(t, records);
OUTPUT Out = new OUTPUT();
DataBag bag = Out.exec(t);
//PigUtil.printBagAsString(bag);
Tuple [] ts = PigUtil.getTuples(bag);
String expectedValue = "a b 55 Xxxxxxx Married Xxxxx X &xxx; Xxxxx X
Xxxxxx xxxxxxx xxxxx NC 28613";
Assert.assertEquals(expectedValue, ts[0].get(0));
}
static public void loadTuple(Tuple t, String[][] input)
throws ExecException {
for (int i = 0; i < input.length; i++) {
log.info("Length " + input[i].length);
if (input[i].length == 1) {
t.set(i, input[i][0]);
} else if (input[i].length > 1) {
t.set(i, loadBag(t, input[i]));
}
}
}
static public DataBag loadBag(Tuple t, String[] input) throws
ExecException {
DataBag bag = BagFactory.getInstance().newDefaultBag();
for (int i = 0; i < input.length; i++) {
Tuple f = TupleFactory.getInstance().newTuple(1);
f.set(0, input[i]);
bag.add(f);
}
return bag;
}
}
On Tue, Apr 24, 2012 at 11:38 AM, Mohit Anchlia <[email protected]>wrote:
> I am still having difficulty converting this line from a file to tuple.
>
> 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml
> 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx;
> Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)}
>
> I looked at:
>
> static public Tuple loadTuple(Tuple t, String[] input) throws
> ExecException {
> for (int i = 0; i < input.length; i++) {
> t.set(i, input[i]);
> }
> return t;
> }
>
>
> but now my question is:
> 1. how do I break it into an array of String?
> 2. Are first 2 fields also tuple?
> 3. Do I just pass the Bag in the input string?
>
> If someone could help me break down above line such that I can call
> loadTuple would be helpful. It will also help me understand what that above
> line is made up of.
>
>
>
> On Fri, Apr 20, 2012 at 9:43 PM, Russell Jurney
> <[email protected]>wrote:
>
>> The unit tests for TOP should be helpful?
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Apr 20, 2012, at 6:40 PM, Thejas Nair <[email protected]> wrote:
>>
>> > Though, not exactly what you are asking for - There is a
>> getTuplesFromConstantTupleStrings function in
>> test//org/apache/pig/test/Util.java that converts string representation of
>> tuples to tuple objects. It is an easier way and more maintainable way of
>> creating tuples in test cases.
>> >
>> > For example - List<Tuple> expectedRes =
>> > Util.getTuplesFromConstantTupleStrings(
>> > new String[] {
>> > "(10,20,30,40L)",
>> > "(11,21,31,41L)",
>> > });
>> >
>> > But not exposed as public interface right now. It make sense to make it
>> part of a public interface.
>> >
>> > -Thejas
>> >
>> >
>> > On 4/20/12 7:48 AM, Mohit Anchlia wrote:
>> >> Thanks for your response. Yes I am using those in my udf eval function.
>> >> Actually my quesiton was around how do I build the tuple? Is there a
>> >> utility method that would let me build my tuple with the following
>> record
>> >> type. I need to populate the tuple in below format so that I can pass
>> it in
>> >> the unit test. It's tab delimited and also has bags.
>> >>
>> >> 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml
>> >> 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx
>> X&xxx;
>> >> Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married
>> >>
>> >> On Thu, Apr 19, 2012 at 6:44 PM, Dmitriy Ryaboy<[email protected]>
>> wrote:
>> >>
>> >>> Something like this (not tested):
>> >>>
>> >>> List<Tuple> bagtuples = Lists.newArrayList();
>> >>>
>> >>> // populate inner tuples, then...
>> >>>
>> >>> DataBag myBag = BagFactory.getInstance().newBag(bagtuples);
>> >>> Tuple t = TupleFactory.getInstance().newTuple(myBag);
>> >>>
>> >>> D
>> >>>
>> >>>
>> >>> On Thu, Apr 19, 2012 at 5:51 PM, Mohit Anchlia<[email protected]
>> >
>> >>> wrote:
>> >>>> Thanks! I am trying to figure out how to create a Tuble object that
>> also
>> >>>> has bags in it. I have a record like this that I want to pass to UDF
>> as a
>> >>>> tuple. Any info would be very helpful.
>> >>>>
>> >>>>
>> >>>> 1333477861077/home/hadoop/pigtest/./formml_dat/999000093_return.xml
>> >>>> 04/03/12 11:36:25 {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx
>> X&xxx;
>> >>>> Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55),(MARITAL:Married)}
>> >>>>
>> >>>>
>> >>>> On Thu, Apr 19, 2012 at 5:16 PM, Dmitriy Ryaboy<[email protected]>
>> >>> wrote:
>> >>>>
>> >>>>> Hi Mohit,
>> >>>>> We just write standard Java unit tests for pig UDFs. You can see a
>> ton
>> >>>>> of them here:
>> >>>>>
>> >>>
>> https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestStringUDFs.java
>> >>>>>
>> >>>>> Does that help?
>> >>>>>
>> >>>>> D
>> >>>>>
>> >>>>> On Thu, Apr 19, 2012 at 5:05 PM, Mohit Anchlia<
>> [email protected]>
>> >>>>> wrote:
>> >>>>>> Is there a way I can just unit test my pig UDF? What's the best
>> way to
>> >>>>> unit
>> >>>>>> test in pig. I saw pigunittest but couldn't find a way to unit test
>> >>> udf.
>> >>>>>
>> >>>
>> >>
>> >
>>
>
>