I have this bug that is killing me, where I can't self-join/cross a dataset
with itself. Its blocking my work :(
The script is like this:
businesses = LOAD
'yelp_phoenix_academic_dataset/yelp_academic_dataset_business.json' using
com.twitter.elephantbird.pig.load.JsonLoader() as json:map[];
/*
There was a bug in the script on the 2nd to last line. Fixed it, still have
same issue.
I found a workaround: if I store the CROSSED relation immediately after the
CROSS, then load it... it works. Something about resetting the plan. This
is a bug. I'll file a JIRA.
On Wed, Dec 4, 2013 at 1:21
I tried to following script (not exactly the same) and it worked correctly
for me.
businesses = LOAD 'dataset' using PigStorage(',') AS (a, b, c,
business_id: chararray, lat: double, lng: double);
locations = FOREACH businesses GENERATE business_id, lat, lng;
STORE locations INTO 'locations.tsv';
If you store immediately after the CROSS, it works. If you do another
FOREACH/GENERATE, etc. it does not.
On Wed, Dec 4, 2013 at 1:41 PM, Pradeep Gollakota pradeep...@gmail.comwrote:
I tried to following script (not exactly the same) and it worked correctly
for me.
businesses = LOAD