I am using the -tagsource option while loading the input data in order to
identify the input source. It seems that, later while I project only
selected fields from the input tuple, there are some assumptions and
certain fields get projected all the time though I try to ignore them.
Take a look at my script.
rawdata = load 'data/201212*' using PigStorage(' ', '-tagsource') as
(filename:chararray, ts: int, ip: chararray, domain: chararray, answer:
chararray);
A = foreach rawdata generate ts, ip, domain, answer,
CONCAT(CONCAT(filename, '_'), UPPER(SUBSTRING(domain, 0, 1))) as
domain_index, filename as filename;
B = foreach A generate ip as ip, SUBSTRING(domain, 0, 1) as
domain_first_char, filename;
dump A;
dump B;
ILLUSTRATE B;
While creating B, I am trying to include only selected fields from A.
However, if I dump B, the 'ts' field (the first field in A) keeps appearing
in B. But in ILLUSTRATE B, everything looks nice as expected.
I appreciate any help. Thanks!
--
Prabu D