Hi Cheolsoo,
this is because i have a 24-dimensional tuple and the definition alone
is a pain. It makes my code unreadable and worse to interpret or fix:
imagine how many errors you can make there.
I would prefer solving this issue within python, so my pig calls do not
get too complicated and possibly messy.
Thanks,
Björn-Elmar
Am 31.10.12 05:59, schrieb Cheolsoo Park:
Hi,
First of all, why can't you pass a tuple of integers to your udf in the
first place? Because then you don't have to cast strings to integers inside
your udf.
Here is how I got your udf working.
cheolsoo@localhost:~/workspace/pig-trunk $cat 1.txt
1,2,3
4,5,6
cheolsoo@localhost:~/workspace/pig-trunk $cat test.pig
register 'test.py' using jython as myfuncs;
a = load '1.txt' using PigStorage(',') as (i:int, j:int, k:int); // declare
as integers
b = group a all;
c = foreach b generate myfuncs.aggHisto(a);
dump c;
@outputSchema("res_histo:tuple()")
def aggHisto(aHistogramSet):
if aHistogramSet is None:
return None;
hist_len = len(aHistogramSet[0])
result=[0]*hist_len
print(aHistogramSet);
for aHistogram in aHistogramSet:
for i in range(0, hist_len):
result[i] = result[i] + aHistogram[i]; // vector addition
return tuple(result)
I get the following result:
((5,7,9))
Thanks,
Cheolsoo
On Tue, Oct 30, 2012 at 10:22 AM, Björn-Elmar Macek
<[email protected]>wrote:
Hi together,
i got a UDF that sums up histograms in form of tuples. The function i
wrote looks like this:
@outputSchema("res_histo:**tuple()")
def aggHisto(aHistogramSet):
if aHistogramSet is None: return None;
hist_len = len(aHistogramSet[0])
result=[0]*hist_len
for aHistogram in aHistogramSet:
for i in range(0,hist_len):
value = int(''.join(map(str,**
aHistogram[i])));
result[i] = result[i] + (value)
return tuple(result)
So for the following input {(1,23,45),(0,0,0)} i SHOULD get the following
output: (1,23,45)
But instead i get: (49,5051,52,5353)
I played around with this for some time and found out this program does
the following:
The line "value = int(''.join(map(str,**aHistogram[i])));" does not
convert the "23" to 23, but it does the following:
It takes every single digit starting with the most siginificant one and
adds 48 to it: 2+48=50 and 3+48=51 resulting in 5051
Why does this happen? Can anybody help me here?
Best regards,
Elmar