Hello experienced users, I am a new to PIG and I have probably beginners question: Is is possible to get original fields after the join from the relation?
Suppose I have a relation A which I want to filter by data from relation B. In order to find matching records I join the relations and then perform a filter. Than I would like to get just fields from relation A. Practical example: dirtydata = load '/data/0120422' using AvroStorage(); sodtr = filter dirtydata by TransactionBlockNumber == 1; sto = foreach sodtr generate Dob.Value as Dob,StoreId, Created.UnixUtcTime; g = GROUP sto BY (Dob,StoreId); sodtime = FOREACH g GENERATE group.Dob AS Dob, group.StoreId as StoreId, MAX(sto.UnixUtcTime) AS latestStartOfDayTime; joined = join dirtydata by (Dob.Value, StoreId) LEFT OUTER, sodtime by (Dob, StoreId); cleandata = filter joined by dirtydata::Created.UnixUtcTime >= sodtime.latestStartOfDayTime; finaldata = FOREACH cleandata generate dirtydata:: ; -- <-- HERE I would like to get just colimns which belonged to original relation. Avro schema is rather complicated so it is not feasible to name are columns here. What is the best practice in that case? Is there any function? Or Is there a completely different approach to solve this kind of tasks? Thanks a lot for any help Jakub --