Hi,
This should just be a simple cogroup.
A = load 'file1.txt' as (q:chararray, d:chararray);
B = load 'file2.txt' as (q:chararray, d:chararray);
counts = foreach (cogroup A by q, B by q) {
num_matches = MIN(TOBAG(COUNT(A), COUNT(B)));
generate
group as q,
num_matches as num_matches;
};
dump counts;
(q1,2)
(q2,0)
(q3,0)
--jacob
@thedatachef
On Jun 20, 2013, at 4:00 AM, Siddhi Borkar wrote:
> Hi,
>
> I have a problem statement where in I have to compare two files and get the
> count of matching attributes.
>
> For ex:
> File 1: file1.txt
>
> q1 d1
> q1 d2
> q2 d3
> q2 d1
>
> File 2: file2.txt
> q1 d1
> q1 d2
> q3 d3
>
> Now what I need is for each distinct q the count of matching d's
>
> For ex, the output should be
> q1 2 (q1 d1 and q1 d2 are matching in both the
> files hence count is 2)
> q2 0 (has no d's matching)
> q3 0
>
> Any idea how this can be achieved?
>
> Thnx in advance
>
> -Sid
>
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the
> property of Persistent Systems Ltd. It is intended only for the use of the
> individual or entity to which it is addressed. If you are not the intended
> recipient, you are not authorized to read, retain, copy, print, distribute or
> use this message. If you have received this communication in error, please
> notify the sender and delete all copies of this message. Persistent Systems
> Ltd. does not accept any liability for virus infected mails.