Hi Alan & Others, I am using pigmix patch at: https://issues.apache.org/jira/browse/PIG-200 and want to generate test data and run pigmix queries on it. As I understand, shell scripts in the patch are intended to generate data for pigmix queries. I have been able to adapt the shell scripts, map-reduce jobs and pigmix queries on our cluster environment. Faced few problems because of hard-coded paths, but resolved most issues. Still having one confusion though. I believe there is one to one correspondence between test data files generated by shell script and files loaded by pig queries. So, wanted to verify if that is the case. According to my understanding, correspondence is as follows:
generate_data.sh pigmix ============================= page_views -> pages10m widerow -> widerow1m power_users -> power_users, power_users10m (either could be used? ) users -> users, users10m (either could be used? ) Is my understanding correct? Since data generated is random, could not verify manually by checking schema inside files. Thanks, Ashutosh