pig + BigDecimal + DataType

ToddG Thu, 08 Jul 2010 17:10:34 -0700

Hello -

I'd like to use pig to process log files containing BigDecimals. I'mloading my data as JSON via a custom LoadFunc. One approach seems to beto represent the BigDecimal fields as DataType.BYTEARRAY, and then writean algebraic EvalFunc:


Example:

1. Data

class Log{
    String id;
    long timestmap;
    BigDecimal costA;
    BigDecmal costB;
}

2. Convert the Log class to JSON:

    org.codehaus.jackson.map.ObjectMapper mapper...
    mapper.writeValueAsString(object)

3. Generated log files look like this:

{"id":"someid", "timestamp":"sometimestamp", "costA":1.00, "costB":1.23456}
{"id":"someid", "timestamp":"sometimestamp", "costA":2.00, "costB":2.23456}
{"id":"someid", "timestamp":"sometimestamp", "costA":3.00, "costB":3.23456}

4. In a custom PIG LoadFunc, decode the JSON:

    Log logEntry = mapper.readValue(encoded, Log.class);

5. Convert the hydrated logEntry to Pig Tuple:

    Tuple tuple = TupleFactory.getInstance().newTuple(NUMBER_OF_FIELDS);
    tuple.set(0, logEntry.getID());
    tuple.set(1, logEntry.getTimestamp());
    tuple.set(2, logEntry.getCostA());
    tuple.set(3, logEntry.getCostB());

Except that you clearly cannot set a BigDecimal into the DefaultTuple,as it does not know how to recognize the BigDecimal.


So, what is the recommended way to proceed from here?

* Do I write my own Tuple impl?

* Do I shove the BigDecimal into the DefaultTuple as a byte array anduse an EvalFunc to read the byte array field? This func could thencreate a BigDecimal and perform the BigDecimal.add().


-Todd

pig + BigDecimal + DataType

Reply via email to