CROSS/Self-Join Bug - Please Help :(

2013-12-04 Thread Russell Jurney
I have this bug that is killing me, where I can't self-join/cross a dataset with itself. Its blocking my work :( The script is like this: businesses = LOAD 'yelp_phoenix_academic_dataset/yelp_academic_dataset_business.json' using com.twitter.elephantbird.pig.load.JsonLoader() as json:map[]; /*

Re: CROSS/Self-Join Bug - Please Help :(

2013-12-04 Thread Russell Jurney
There was a bug in the script on the 2nd to last line. Fixed it, still have same issue. I found a workaround: if I store the CROSSED relation immediately after the CROSS, then load it... it works. Something about resetting the plan. This is a bug. I'll file a JIRA. On Wed, Dec 4, 2013 at 1:21

Re: CROSS/Self-Join Bug - Please Help :(

2013-12-04 Thread Pradeep Gollakota
I tried to following script (not exactly the same) and it worked correctly for me. businesses = LOAD 'dataset' using PigStorage(',') AS (a, b, c, business_id: chararray, lat: double, lng: double); locations = FOREACH businesses GENERATE business_id, lat, lng; STORE locations INTO 'locations.tsv';

Re: CROSS/Self-Join Bug - Please Help :(

2013-12-04 Thread Russell Jurney
If you store immediately after the CROSS, it works. If you do another FOREACH/GENERATE, etc. it does not. On Wed, Dec 4, 2013 at 1:41 PM, Pradeep Gollakota pradeep...@gmail.comwrote: I tried to following script (not exactly the same) and it worked correctly for me. businesses = LOAD