Hello, I have a parent/child relation, and I would like to get a sample of this data using Pig's SAMPLE.
My question is whether COGROUP is the optimal way of getting the child records of the sampled parent data. orders = LOAD 'orders' AS (order_id, order_date); order_details = LOAD 'order_details' AS (order_id, product_id, quantity); sample_orders = SAMPLE orders 0.01; grpd = COGROUP sample_orders BY order_id, order_details BY order_id; -- I want to get only the order_details into a relation, so that I only -- have the order_detail fields in the sample_order_details sample_order_details = FOREACH grpd GENERATE FLATTEN(order_details); STORE sample_orders INTO 'sample_orders'; STORE sample_order_details INTO 'sample_order_details'; Is this a reasonable way of getting a sample of parent records, then getting the corresponding child records of the sample parent records? JOIN seems difficult or unwieldy because I would need to project only the order_details fields from the joined relation. Thanks, --Nate
