It really depends on the depth of your graph.  Are you only dealing
with connected components of depth 3 or could they be deeper.

For instance, can you have x -> y, y -> z, and z -> a?  If so, do you
need your record to be x, y, z, a?  Or, are you guaranteed that it
will always be x -> z and y -> z?

I really need more information about your data before I can offer too
much advice.

On Thu, Jun 10, 2010 at 5:38 PM, Renato Marroquín Mogrovejo
<[email protected]> wrote:
> Hi everybody, thanks a lot for your responses.
>
> I am actually not looking for a transitive closure, I am not trying to
> "infer" that given *x -> y* and *y -> z*, then *x -> z *(@hc.busy: that is
> in short terms a transitive closure ^^)  because I have the data that proves
> it. And yeah I am aware of first logic expressive power, but maybe I will
> think about giving an inference engine a try some time in the future.
>
> I am actually looking at a connected graph-like problem. The sample records
> resemble a bidirectional triangle.
>
>            990201
>         /              \
>        /                \
>       /                  \
>      /                    \
> 770011  ------- 770083
>
> I tried using a smaller version of the problem by using SQL and got a
> horribly huge query which is a non-scalable possibility for me. I have over
> 500Gbs in structured files. I used a triple join but I had to retrieve all
> existing possibilities.
> So that is why I thought on using something like Hc.busy mentioned.
>
> Queries: idx1, idx2, sc, va, p1, p2
> possibilities: possibs = foreach queries generate idx1, flatten(a triple
> self join of the data)
>
> and by using the flattening command, get all possible combinations, but I am
> not sure that would be the correct approach. Anyhow I thought maybe
> performing some breadth first search but that would give me an algorithm
> O(n2)  so  )=
> Hey Tanton how would you implement and use an union-find structure? Do you
> think it is possible with PIG?
>
> Thanks again.
>
>
> 2010/6/10 Tanton Gibbs <[email protected]>
>
>> To be specific, he's looking for connected components in the graph.
>>
>> It's not overly easy to do in an ETL tool (or in pig), but if you can
>> wrap the script in a loop it is possible.
>>
>> There are actually some really interesting parallel algorithms for
>> finding connected components.  If you know you are only going to have
>> two keys, it is a bit simpler to code (but not any more efficient).
>> Essentially, you can implement your union-find algorithm as a series
>> of sorts and merges, which on a large parallel system is actually
>> quite fast.
>>
>> Tanton
>>
>> On Thu, Jun 10, 2010 at 3:33 PM, hc busy <[email protected]> wrote:
>> > What's a transitive closure?
>> >
>> > On Thu, Jun 10, 2010 at 2:34 PM, Gianmarco <[email protected]>
>> wrote:
>> >
>> >> I think what he wants is a transitive closure of the relation, which
>> >> is not achievable in SQL-like languages alone (first order logic
>> >> expressive power).
>> >> I suppose Pig Latin falls in this category.
>> >>
>> >>
>> >> -- Gianmarco
>> >>
>> >> On Thu, Jun 10, 2010 at 19:54, hc busy <[email protected]> wrote:
>> >> > Is this like a tricky interview question? I don't see the pattern
>> between
>> >> > those three numbers you listed and the sample of the table.
>> >> >
>> >> > 770011 770083 524 1e-120 89 12
>> >> > 770083 770011 494 1e-120 39 100
>> >> >
>> >> > ahh, I guess these are related because id1=id2 an id2=id1... Here's a
>> >> first
>> >> > pass at the problem. Project:
>> >> >
>> >> > P1 = foreach table generate id1 as id1, id2 as id2, *;
>> >> > P2 = foreach table generate id2 as id1, id1 as id2, *;
>> >> > J = join P1 by (id1, id2), P2 by (id1,id2);
>> >> >
>> >> > and now J contains pairs of rows from original table where id1 and id2
>> >> are
>> >> > reversed.
>> >> >
>> >> > is this what you want?
>> >> >
>> >> > On Wed, Jun 9, 2010 at 6:54 PM, Renato Marroquín Mogrovejo <
>> >> > [email protected]> wrote:
>> >> >
>> >> >> Hi everyone, today I came across with a particular query that I don't
>> >> know
>> >> >> how to model in PIG. Part of my data looks like this:
>> >> >>
>> >> >> Id1 Id2 Sc Va P1 P2
>> >> >> --------- --------- ----- --------- ----- ----
>> >> >> 770011 990201 401 1e-125 100 65
>> >> >> 990201 770011 440 1e-125 100 42
>> >> >> 770011 770083 524 1e-120 89 12
>> >> >> 770083 770011 494 1e-120 39 100
>> >> >> 990201 770083 341 1e-125 73 41
>> >> >> 770083 990201 421 1e-125 90 85
>> >> >> .
>> >> >> .
>> >> >> .
>> >> >>
>> >> >> what I would like to retrieve is something like
>> >> >> this:                                             770011 990201
>> 770083
>> >> >> because they are records actually related.
>> >> >> Any kind of ideas are highly appreciated. Thanks in advanced.
>> >> >>
>> >> >> Renato M.
>> >> >>
>> >> >
>> >>
>> >
>>
>

Reply via email to