To be specific, he's looking for connected components in the graph.

It's not overly easy to do in an ETL tool (or in pig), but if you can
wrap the script in a loop it is possible.

There are actually some really interesting parallel algorithms for
finding connected components.  If you know you are only going to have
two keys, it is a bit simpler to code (but not any more efficient).
Essentially, you can implement your union-find algorithm as a series
of sorts and merges, which on a large parallel system is actually
quite fast.

Tanton

On Thu, Jun 10, 2010 at 3:33 PM, hc busy <[email protected]> wrote:
> What's a transitive closure?
>
> On Thu, Jun 10, 2010 at 2:34 PM, Gianmarco <[email protected]> wrote:
>
>> I think what he wants is a transitive closure of the relation, which
>> is not achievable in SQL-like languages alone (first order logic
>> expressive power).
>> I suppose Pig Latin falls in this category.
>>
>>
>> -- Gianmarco
>>
>> On Thu, Jun 10, 2010 at 19:54, hc busy <[email protected]> wrote:
>> > Is this like a tricky interview question? I don't see the pattern between
>> > those three numbers you listed and the sample of the table.
>> >
>> > 770011 770083 524 1e-120 89 12
>> > 770083 770011 494 1e-120 39 100
>> >
>> > ahh, I guess these are related because id1=id2 an id2=id1... Here's a
>> first
>> > pass at the problem. Project:
>> >
>> > P1 = foreach table generate id1 as id1, id2 as id2, *;
>> > P2 = foreach table generate id2 as id1, id1 as id2, *;
>> > J = join P1 by (id1, id2), P2 by (id1,id2);
>> >
>> > and now J contains pairs of rows from original table where id1 and id2
>> are
>> > reversed.
>> >
>> > is this what you want?
>> >
>> > On Wed, Jun 9, 2010 at 6:54 PM, Renato Marroquín Mogrovejo <
>> > [email protected]> wrote:
>> >
>> >> Hi everyone, today I came across with a particular query that I don't
>> know
>> >> how to model in PIG. Part of my data looks like this:
>> >>
>> >> Id1 Id2 Sc Va P1 P2
>> >> --------- --------- ----- --------- ----- ----
>> >> 770011 990201 401 1e-125 100 65
>> >> 990201 770011 440 1e-125 100 42
>> >> 770011 770083 524 1e-120 89 12
>> >> 770083 770011 494 1e-120 39 100
>> >> 990201 770083 341 1e-125 73 41
>> >> 770083 990201 421 1e-125 90 85
>> >> .
>> >> .
>> >> .
>> >>
>> >> what I would like to retrieve is something like
>> >> this:                                             770011 990201 770083
>> >> because they are records actually related.
>> >> Any kind of ideas are highly appreciated. Thanks in advanced.
>> >>
>> >> Renato M.
>> >>
>> >
>>
>

Reply via email to