Re: pig + BigDecimal + DataType

Dmitriy Ryaboy Thu, 08 Jul 2010 17:17:10 -0700

the latter.

On Thu, Jul 8, 2010 at 5:09 PM, ToddG <[email protected]> wrote:


> Hello -
>
> I'd like to use pig to process log files containing BigDecimals. I'm
> loading my data as JSON via a custom LoadFunc. One approach seems to be to
> represent the BigDecimal fields as DataType.BYTEARRAY, and then write an
> algebraic EvalFunc:
>
> Example:
>
> 1. Data
>
> class Log{
>    String id;
>    long timestmap;
>    BigDecimal costA;
>    BigDecmal costB;
> }
>
> 2. Convert the Log class to JSON:
>
>    org.codehaus.jackson.map.ObjectMapper mapper...
>    mapper.writeValueAsString(object)
>
> 3. Generated log files look like this:
>
> {"id":"someid", "timestamp":"sometimestamp", "costA":1.00, "costB":1.23456}
> {"id":"someid", "timestamp":"sometimestamp", "costA":2.00, "costB":2.23456}
> {"id":"someid", "timestamp":"sometimestamp", "costA":3.00, "costB":3.23456}
>
> 4. In a custom PIG LoadFunc, decode the JSON:
>
>    Log logEntry = mapper.readValue(encoded, Log.class);
>
> 5. Convert the hydrated logEntry to Pig Tuple:
>
>    Tuple tuple = TupleFactory.getInstance().newTuple(NUMBER_OF_FIELDS);
>    tuple.set(0, logEntry.getID());
>    tuple.set(1, logEntry.getTimestamp());
>    tuple.set(2, logEntry.getCostA());
>    tuple.set(3, logEntry.getCostB());
>
> Except that you clearly cannot set a BigDecimal into the DefaultTuple, as
> it does not know how to recognize the BigDecimal.
>
> So, what is the recommended way to proceed from here?
>
> * Do I write my own Tuple impl?
> * Do I shove the BigDecimal into the DefaultTuple as a byte array and use
> an EvalFunc to read the byte array field? This func could then create a
> BigDecimal and perform the BigDecimal.add().
>
> -Todd
>
>
>

Re: pig + BigDecimal + DataType

Reply via email to