What about denormalizing and just representing these as 4-tuples of (id,
type, name, value) in a text file? You could always then group by type if
you need to get back to distinct types.

Are you joining against a larger dataset? I ask just because 10x200 values
is not a lot and can be done without Hadoop.


On Wed, Mar 21, 2012 at 11:49 AM, shan s <[email protected]> wrote:

> In the relational database we have a large key, value type of data in 2
> tables. Let’s call it Entity and EntityAttribute.
>
>
>
> Table: Entity                       Columns: Entity ID, Entity Type
>
> Table: EntityAttribute        Columns: EntityID, PropertyName,
> PropertyValue.
>
>
>
> These entities are loosely related to each other, hence are under a single
> roof.
>
> There are approx.  100 attributes among entities and 20 different entity
> types.
>
>
>
> My questions are:
>
> -          What is the best way to represent this kind of key-value pair
> data for processing with Pig.
>
> -          Do I represent it as key=value pairs in the text files,  if so
> how would I process such data in Pig.
>
> -          Any pointer to UDFs that help with key- value pairs would be
> great.
>
>
>
> Many Thanks,
>
> Shan
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
[email protected] going forward.*

Reply via email to