Follow Up: Thanks Dmitriy, that worked out really well. I just followed
lead of builtin/IntAvg.java. In my case, I wound up storing intermediate
BigDecimal values as chararrays...expensive to create all those objects,
but conceptually simple.
-Todd
On 7/8/10 5:15 PM, Dmitriy Ryaboy wrote:
the latter.
On Thu, Jul 8, 2010 at 5:09 PM, ToddG<[email protected]> wrote:
Hello -
I'd like to use pig to process log files containing BigDecimals. I'm
loading my data as JSON via a custom LoadFunc. One approach seems to be to
represent the BigDecimal fields as DataType.BYTEARRAY, and then write an
algebraic EvalFunc:
Example:
1. Data
class Log{
String id;
long timestmap;
BigDecimal costA;
BigDecmal costB;
}
2. Convert the Log class to JSON:
org.codehaus.jackson.map.ObjectMapper mapper...
mapper.writeValueAsString(object)
3. Generated log files look like this:
{"id":"someid", "timestamp":"sometimestamp", "costA":1.00, "costB":1.23456}
{"id":"someid", "timestamp":"sometimestamp", "costA":2.00, "costB":2.23456}
{"id":"someid", "timestamp":"sometimestamp", "costA":3.00, "costB":3.23456}
4. In a custom PIG LoadFunc, decode the JSON:
Log logEntry = mapper.readValue(encoded, Log.class);
5. Convert the hydrated logEntry to Pig Tuple:
Tuple tuple = TupleFactory.getInstance().newTuple(NUMBER_OF_FIELDS);
tuple.set(0, logEntry.getID());
tuple.set(1, logEntry.getTimestamp());
tuple.set(2, logEntry.getCostA());
tuple.set(3, logEntry.getCostB());
Except that you clearly cannot set a BigDecimal into the DefaultTuple, as
it does not know how to recognize the BigDecimal.
So, what is the recommended way to proceed from here?
* Do I write my own Tuple impl?
* Do I shove the BigDecimal into the DefaultTuple as a byte array and use
an EvalFunc to read the byte array field? This func could then create a
BigDecimal and perform the BigDecimal.add().
-Todd