If the arbitrary objects you refer to fit nicely into pig's notion of tuples/bags/maps/primitives, then you can directly use that.
Otherwise, due to limited support for complex/arbitrary objects in pig schema (no support for something like Writable for example), you will most probably need to treat the object's as bytearray (assuming they are serializable) and covert to/from byte[] as part of their use. Pig currently does not allow you to decouple an object from its serialization.
Regards, Mridul On Thursday 07 April 2011 07:00 AM, Mark wrote:
If I wanted to load arbitrary objects into some tuples what classes should I be looking at? Would I need some of storage class? For example I have data file with out that contains org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns. I would like to iterate over them using pig using something like: rows = LOAD 'data' using TopKStringPatternsStorage(); Is this correct? Is there any wiki on creating storages? Is there anything I should look out for? Thanks for the pointers
