I found from some other message that, starting pig with the flag '-t ColumnMapKeyPrune' helps fixing this issue i.e., start pig using the commandpig -x local -t ColumnMapKeyPrune sample.pig.
On Sun, Feb 3, 2013 at 12:17 PM, Prabu Dhakshinamurthy <[email protected]> wrote: > Dump of A: > (100,123.98.11.123,google.com,{(google)},20121201_G,20121201) > (95,500.98.11.123,yahoo.com,{(yahoo)},20121201_Y,20121201) > (107,123.98.11.123,google.com,{(google)},20121201_G,20121201) > (156,123.98.11.123,cnn.com,{(cnn)},20121201_C,20121201) > (100,500.98.11.123,ndtv.com,{(ndtv)},20121201_N,20121201) > (200,123.98.11.123,google.com,{(google)},20121202_G,20121202) > (283,500.98.11.123,yahoo.com,{(yahoo)},20121202_Y,20121202) > (283,500.98.11.123,pinterest.com,{(pinterest)},20121202_P,20121202) > (204,600.10.100.221,bbc.com,{(bbc)},20121202_B,20121202) > > > Dump of B: > (100,g,20121201) > (95,y,20121201) > (107,g,20121201) > (156,c,20121201) > (100,n,20121201) > (200,g,20121202) > (283,y,20121202) > (283,p,20121202) > (204,b,20121202) > > ILLUSTRATE B: > > | B | ip:chararray | domain_first_char:chararray | > filename:chararray > | | 123.98.11.123 | g | > 20121202 > > As seen in Dump B, instead of printing the ip value as the first field (as > in illustrate B), it prints the ts field. > > > On Sun, Feb 3, 2013 at 11:56 AM, Prabu Dhakshinamurthy > <[email protected]> wrote: >> >> I am using the -tagsource option while loading the input data in order to >> identify the input source. It seems that, later while I project only >> selected fields from the input tuple, there are some assumptions and certain >> fields get projected all the time though I try to ignore them. >> >> Take a look at my script. >> >> rawdata = load 'data/201212*' using PigStorage(' ', '-tagsource') as >> (filename:chararray, ts: int, ip: chararray, domain: chararray, answer: >> chararray); >> >> A = foreach rawdata generate ts, ip, domain, answer, >> CONCAT(CONCAT(filename, '_'), UPPER(SUBSTRING(domain, 0, 1))) as >> domain_index, filename as filename; >> B = foreach A generate ip as ip, SUBSTRING(domain, 0, 1) as >> domain_first_char, filename; >> dump A; >> dump B; >> ILLUSTRATE B; >> >> While creating B, I am trying to include only selected fields from A. >> However, if I dump B, the 'ts' field (the first field in A) keeps appearing >> in B. But in ILLUSTRATE B, everything looks nice as expected. >> >> I appreciate any help. Thanks! >> >> -- >> >> Prabu D > > > > > -- > > Prabu Dhakshinamurthy > Graduate student | CSE | UCSD -- Prabu Dhakshinamurthy Graduate student | CSE | UCSD
