Hi All,

I'm a beginner pig user and this is my first post to the Pig mailing list.

Anyway, to answer your question, the first thing that comes to my mind is
that Pig may not be able to do a complex join like that.

However, you can first flatten the bag in A, then do your join and then do
a group by do get the result in the format you are looking for. This may
not be an idea solution, but it should work.

Pradeep


On Wed, May 22, 2013 at 8:49 AM, Ho Duc Ha <hodu...@gmail.com> wrote:

> We've got a data type that is modeled after a typical object-oriented
> data-model format (simple fields, and collections of other objects). We're
> trying to accomplish the following join:
>
> Here's out example input:
> -------------------------------------
> data1 = {  ( 'a1', { ('a2-thing1'), ('a2-thing2') } )  }
> data2 = {  ( 'a2-thing1', 'x-value1' ), ( 'a2-thing1', 'x-value2' )  }
>
> Here's what we want to get:
> --------------------------------------
> ( 'a1', { ('a2-thing1', {
> ('x-value1'), ('x-value2') }
> ) }
> )
>
> Notice that we are trying to join the collection of a2 fields of the 1st
> data set, on the first field in the 2nd data set.
>
> We tried this:
> --------------------
> A = load 'data1' as ( a:tuple(a1:chararray, a2:bag{(a2t:chararray)}) );
> B = load 'data2' as ( a2t:chararray, x:chararray );
> X = join A by a2.a2t, B by a2t;
>
> We get this error:
> ---------------------------
> ERROR 1128: Cannot find field a2t in
> a1:chararray,a2:bag{:tuple(a2t:chararray)}
>
> Try as we might, we cannot find the right way to do this complex join.
> Questions:
>   1) Should we be simplifying our data format into a more SQL table-like
> structure and doing more joins to reduce the complexity?
>   2) How can we accomplish joining data2's data into the data1 "objects"?
>
> --
> Ho Duc Ha
>

Reply via email to