Using an alias withing Nested Foreach causes indeterminate behaviour
--------------------------------------------------------------------
Key: PIG-1633
URL: https://issues.apache.org/jira/browse/PIG-1633
Project: Pig
Issue Type: Bug
Affects Versions: 0.7.0, 0.6.0, 0.5.0, 0.4.0
Reporter: Viraj Bhat
I have created a RANDOMINT function which generates random numbers between (0
and specified value), For example RANDOMINT(4) gives random numbers between 0
and 3 (inclusive)
{code}
$hadoop fs -cat rand.dat
f
g
h
i
j
k
l
m
{code}
The pig script is as follows:
{code}
register math.jar;
A = load 'rand.dat' using PigStorage() as (data);
B = foreach A {
r = math.RANDOMINT(4);
generate
data,
r as random,
((r == 3)?1:0) as quarter;
};
dump B;
{code}
The results are as follows:
{code}
{color:red}
(f,0,0)
(g,3,0)
(h,0,0)
(i,2,0)
(j,3,0)
(k,2,0)
(l,0,1)
(m,1,0)
{color}
{code}
If you observe, (j,3,0) is created because r is used both in the foreach and
generate clauses and generate different values.
Modifying the above script to below solves the issue. The M/R jobs from both
scripts are the same. It is just a matter of convenience.
{code}
A = load 'rand.dat' using PigStorage() as (data);
B = foreach A generate
data,
math.RANDOMINT(4) as r;
C = foreach B generate
data,
r,
((r == 3)?1:0) as quarter;
dump C;
{code}
Is this issue related to PIG:747?
Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.