What about denormalizing and just representing these as 4-tuples of (id, type, name, value) in a text file? You could always then group by type if you need to get back to distinct types.
Are you joining against a larger dataset? I ask just because 10x200 values is not a lot and can be done without Hadoop. On Wed, Mar 21, 2012 at 11:49 AM, shan s <[email protected]> wrote: > In the relational database we have a large key, value type of data in 2 > tables. Let’s call it Entity and EntityAttribute. > > > > Table: Entity Columns: Entity ID, Entity Type > > Table: EntityAttribute Columns: EntityID, PropertyName, > PropertyValue. > > > > These entities are loosely related to each other, hence are under a single > roof. > > There are approx. 100 attributes among entities and 20 different entity > types. > > > > My questions are: > > - What is the best way to represent this kind of key-value pair > data for processing with Pig. > > - Do I represent it as key=value pairs in the text files, if so > how would I process such data in Pig. > > - Any pointer to UDFs that help with key- value pairs would be > great. > > > > Many Thanks, > > Shan > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [email protected] going forward.*
