Hi all,

I'm walking through a pig script in grunt, but I am getting stuck with some
issues using nested foreach. I'm using Pig version 0.9.2

I'm trying to find the number of unique users from a bag 'top100'

grunt> describe top100
top100: {name: chararray,licenses: long,instance: chararray,transactions:
long,users: {(projected::userId: chararray)},runTimes: {(projected::runTime:
double)}}

grunt> uu = foreach top100 {
>> uniqUsers = distinct users;
>> generate uniqUsers as uniqUsers;
>> }
ERROR 1200: Pig script failed to parse:
<line 132, column 9> Invalid scalar projection: uniqUsers : A column needs
to be projected from a relation for it to be used as a scalar

I realized that I had defined uniqUsers earlier, but I didn't think it would
conflict inside the nested foreach block. The schema for uniqUsers is:

grunt> describe uniqUsers
uniqUsers: {key: chararray,uniqUsers: long}

I tried a different alias for the distinct clause and it seems to work.

grunt> uu = foreach top100 {
>> un = distinct users;
>> generate un as uniqUsers;
>> }
grunt> describe uu
uu: {un: {(projected::userId: chararray)}}
grunt> uu = foreach top100 {
>> un = distinct users;
>> generate COUNT(un) as uniqUsers;
>> }
grunt> describe uu
uu: {uniqUsers: long}

I was curious, so I tried the following, but I do not understand what the
results are.

grunt> u2 = foreach top100 {
>> uniqUsers = distinct users;
>> generate uniqUsers.key;
>> }
grunt> describe u2
u2: {projected::userId: chararray}

grunt> u3 = foreach top100 {
>> uniqUsers = distinct users;
>> generate uniqUsers.uniqUsers;
>> }
grunt> describe u3
u3: {projected::userId: chararray}

Specifically, what is actually in the result of u3? Why is it a chararray
when uniqUsers.uniqUsers is a long? Why is the alias still
projected::userId?

Thanks for any help!

-Chun

PS Sorry for the double post, I accidentally hit a keyboard shortcut for
Send.

Reply via email to