You can do the SPLIT outside the nested FOREACH. I'm assuming you have UDF defined for VALID.
So, your scrpit can be written as: rawRecords = LOAD '/data' as ...; grouped = GROUP rawRecords BY msisdn; validAndNotValidRecords = FOREACH grouped { ordered = ORDER rawRecords BY ts; GENERATE group as group_key, ordered as data; }; SPLIT validAndNotValidRecords INTO validRecords IF VALID(data), INTO invalidRecords OTHERWISE; On Tue, Jul 23, 2013 at 8:58 AM, Serega Sheypak <serega.shey...@gmail.com>wrote: > Omg, thanks it's exactly the thing I need. > > I can't do it before GROUP. I need group by key, then sort by timestamp > field inside each group. > After sort is done I do can determine non valid records. > I've provided simplified case. > > The only problem is that SPLIT is not allowed in nested FOREACH statement. > > > 2013/7/23 Pradeep Gollakota <pradeep...@gmail.com> > > > You can use the SPLIT operator to split a relation into two (or more) > > relations. http://pig.apache.org/docs/r0.11.1/basic.html#SPLIT > > > > Also, you should probably do this before GROUP. As a best practice (and > > general pig optimization strategy), you should filter (and project) early > > and often. > > > > > > On Tue, Jul 23, 2013 at 4:27 AM, Serega Sheypak < > serega.shey...@gmail.com > > >wrote: > > > > > Hi, I have rather simple problem and I can't create nice solution. > > > Here is my input: > > > msisdn longitude latitude ts > > > 1 20.30 40.50 123 > > > 1 0.0 null 456 > > > 2 60.70 34.67 678 > > > 2 null null 978 > > > > > > I need: > > > group by msisdn > > > order by ts inside each group > > > filter records in each group: > > > 1. put all records where longitude, latitude are valid on one side > > > 2. put all records where longitude/latidude = 0.0/null to the othe side > > > > > > Here is pig pseudo-code: > > > rawRecords = LOAD '/data' as ...; > > > grouped = GROUP rawRecords BY msisdn; > > > validAndNotValidRecords = FOREACH grouped{ > > > ordered = ORDER rawRecords BY ts; > > > --do sometihing here to filter valid and not valid > > records.... > > > } > > > STORE notValidRecords INTO /not_valid_data; > > > > > > someOtherProjection = GROUP validRecords By msisdn; > > > --continue to work with filtered valid records... > > > > > > Can I do it in a single pig script, or I need to create two scripts: > > > the first one would filter not valid records and store them > > > the second one will continue to process filtered set of records? > > > > > >