I agree with Prashant. I am hard pressed to find a case where it would be useful, and I would much rather it fail on parse than while running.
2012/2/15 Prashant Kommireddi <[email protected]> > AVG over chararrays is not a usual case, simply because it does not make > sense in most cases. For eg, what would be the average if it were a bag of > first or last names? AVG would fail if it tried to convert String to > Integer or Double. > > In your case its the best to declare it int/long if you know the data type > beforehand. > > Thanks, > Prashant > > 2012/2/15 Haitao Yao <[email protected]> > > > I solve this problem by extending the build in AVG function to accept > char > > array bag as input and calculate the result. > > > > why the build-in AVG can not accept the char array bag and convert the > > value to double and calculate the result? > > > > > > > > 在 2012-2-15,下午4:04, Jonathan Coveney 写道: > > > > > the issue is that doing (int)b.x does not cast each column to an int, > but > > > rather, it tries to cast the bag itself. Short of flattening out the > bag > > > and projecting it as an int, which is inefficient, I suppose you could > > make > > > a UDF that calculate the Average of chararrays by casting to an > int...but > > > then that raises the question of why you couldn't just load it as an > > x:int > > > in the first place. > > > > > > So generally, you need to do something like "foreach rel generate > > (int)x". > > > In this case that doesn't work as efficiently, but this is kind of a > > weird > > > case. > > > > > > 2012/2/14 Haitao Yao <[email protected]> > > > > > >> hi, all > > >> here's my pig script: > > >> > > >> A = load 'input' as (b:bag{t:(x:int, y:int)}); > > >> B = foreach A generate AVG(b.x); > > >> describe B; > > >> > > >> it works well. > > >> if the b.x is char array, the problems arise: > > >> A = load 'input' as (b:bag{t:(x:chararray, y:int)}); > > >> B = foreach A generate AVG((int)b.x); > > >> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt > - > > >> ERROR 1052: > > >> <line 4, column 28> Cannot cast bag with schema > > :bag{:tuple(x:chararray)} > > >> to int > > >> Details at logfile: /tmp/pig_1329286634873.log > > >> > > >> Why? How can I calculate the avg of b.x if b.x must be a chararray? > > >> > > >> > > >> here's the running snapshot in Grunt: > > >> > > >> grunt> A = load 'input' as (b:bag{t:(x:int, y:int)}); > > >> grunt> B = foreach A generate AVG(b.x); > > >> grunt> describe B; > > >> B: {double} > > >> grunt> A = load 'input' as (b:bag{t:(x:chararray, y:int)}); > > >> grunt> B = foreach A generate AVG((int)b.x); > > >> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt > - > > >> ERROR 1052: > > >> <line 4, column 28> Cannot cast bag with schema > > :bag{:tuple(x:chararray)} > > >> to int > > >> Details at logfile: /tmp/pig_1329286634873.log > > >> grunt> > > >> > > >> thanks. > > >> > > >> > > > > >
