Hi,

This should just be a simple cogroup.

A = load 'file1.txt' as (q:chararray, d:chararray);
B = load 'file2.txt' as (q:chararray, d:chararray);

counts = foreach (cogroup A by q, B by q) {
                num_matches = MIN(TOBAG(COUNT(A), COUNT(B)));
                generate
                  group       as q,
                  num_matches as num_matches;
             };

dump counts;

(q1,2)
(q2,0)
(q3,0)

--jacob
@thedatachef

On Jun 20, 2013, at 4:00 AM, Siddhi Borkar wrote:

> Hi,
> 
> I have a problem statement where in I have to compare two files and get the 
> count of matching attributes.
> 
> For ex:
> File 1:  file1.txt
> 
> q1           d1
> q1           d2
> q2           d3
> q2           d1
> 
> File 2: file2.txt
> q1           d1
> q1           d2
> q3           d3
> 
> Now what I need is for each distinct q  the count of matching d's
> 
> For ex, the output should be
> q1           2  (q1     d1 and q1            d2 are matching in both the 
> files hence count is 2)
> q2           0 (has no d's matching)
> q3           0
> 
> Any idea how this can be achieved?
> 
> Thnx in advance
> 
> -Sid
> 
> 
> 
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the 
> property of Persistent Systems Ltd. It is intended only for the use of the 
> individual or entity to which it is addressed. If you are not the intended 
> recipient, you are not authorized to read, retain, copy, print, distribute or 
> use this message. If you have received this communication in error, please 
> notify the sender and delete all copies of this message. Persistent Systems 
> Ltd. does not accept any liability for virus infected mails.

Reply via email to