Hi Johannes, Try this C = LOAD 'in.dat' AS (A1); A = LOAD 'in2.dat' AS (A1);
joined = JOIN A BY A1 LEFT OUTER, C BY A1; DESCRIBE joined; newEntries = FILTER joined BY C::A1 IS NULL; DUMP newEntries; Ruslan On Wed, Jul 4, 2012 at 4:42 PM, Johannes Schwenk <[email protected]> wrote: > Hi Alan, > > I'd like to use this method to not include records in my output that are > already present in previously computed data. So I tried to use your > suggestion like this: > > grunt> cat in.dat > 1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > grunt> C = LOAD 'in.dat' AS (A1); -- previously generated data > grunt> cat in2.dat > 12 > 2 > 13 > 1 > 10 > 9 > 11 > 8 > grunt> A = LOAD 'in2.dat' AS (A1); -- new data > grunt> B1 = join A by A1, C by A1; > grunt> B2 = filter B1 by SIZE(C) == 0; > > Which gives me this error: > > 2012-07-04 14:36:16,768 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1200: Pig script failed to parse: > <line 14, column 23> Invalid scalar projection: C : A column needs to be > projected from a relation for it to be used as a scalar > Details at logfile: /home/schwenk/pig-0.10.0/pig_1341403702015.log > > The relevant pig stack trace from the logfile can be found at > > http://pastebin.com/MxPfduWS > > What am I doing wrong? > > Greetings, > Johannes > > Am 25.06.2012 18:39, schrieb Alan Gates: >> This type of in is really a semi-join. So you could rewrite this as: >> >> B1 = join A by A1, C by A1; >> B2 = filter B1 by SIZE(C) > 0; >> B = foreach B2 flatten(A); >> >> Alan. >> >> On Jun 25, 2012, at 2:50 AM, yonghu wrote: >> >>> Dear all, >>> >>> in the sql, there is a in clause which is used to check if the value >>> is in a set or not? Does pig also have the same in clause? Such as: >>> >>> B = filter A by A1 in C; >>> >>> A,B,C are relation names and A1 is a column_name of A. >>> >>> Thanks! >>> >>> Yong >> > > > > Johannes Schwenk > > -- > Softwareentwickler (Reporting) > ________________________________________________________ > > ADITION technologies AG > Schwarzwaldstraße 78b > 79117 Freiburg > > http://www.adition.com > > T +49 / (0)761 / 88147 - 30 > F +49 / (0)761 / 88147 - 77 > SUPPORT +49 / (0)1805 - ADITION > > (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min) > > Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076 > Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter > Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer > UStIDNr.: DE 218 858 434 > > >
