That looks to have worked. Thanks. On Wed, Dec 08, 2010 at 02:04:07PM -0800, Dmitriy Ryaboy wrote: > Try explicitly casting argMap#'s' to a chararray? > > > On Wed, Dec 8, 2010 at 1:53 PM, Kris Coward <[email protected]> wrote: > > > Hi, > > > > I've recently gotten stumped by a problem where my attempts to dump the > > relations produced by a GROUP command give the following error (though > > illustrating the same relation works fine): > > > > java.io.IOException: Type mismatch in key from map: expected > > org.apache.pig.impl.io.NullableBytesWritable, recieved > > org.apache.pig.impl.io.NullableText > > at > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807) > > at > > org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466) > > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108) > > . > > . > > . > > > > for a little background, the relation that's failing is called y5, and > > is produced by the following string of commands (in grunt): > > > > y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as > > argMap; > > y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp; > > y4 = FILTER y3 BY (uid is not null); > > y5 = GROUP y4 BY uid; > > > > and to get an idea what sort of data is involved, ILLUSTRATE y4 yields: > > > > > > ----------------------------------------------------------------------------------------------------- > > | y1 | timestamp: int | args: bag({tuple_of_tokens: (token: > > chararray)}) | > > > > ----------------------------------------------------------------------------------------------------- > > | | 1265950806 | {(s=1381688313), > > (u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} | > > > > ----------------------------------------------------------------------------------------------------- > > > > ----------------------------------------------------------------------------------------------- > > | y2 | timestamp: int | argMap: map > > | > > > > ----------------------------------------------------------------------------------------------- > > | | 1265950806 | {u=F68FFA1F655FDF494ABA520D95E1D99E, > > ts=1265950805, s=1381688313} | > > > > ----------------------------------------------------------------------------------------------- > > -------------------------------------------- > > | y3 | uid: bytearray | timestamp: int | > > -------------------------------------------- > > | | 1381688313 | 1265950806 | > > -------------------------------------------- > > -------------------------------------------- > > | y4 | uid: bytearray | timestamp: int | > > -------------------------------------------- > > | | 1381688313 | 1265950806 | > > -------------------------------------------- > > > > The same problem was also produced when the FILTER command was omitted, > > and the relevant chunk of code in myudfs.httpArgParse is: > > > > StringTokenizer tok = new StringTokenizer((String)pair, "=", false); > > if (tok.hasMoreTokens() ) { > > String oKey = tok.nextToken(); > > if (tok.hasMoreTokens() ) { > > Object oValue = tok.nextToken(); > > output.put(oKey, oValue); > > } else { > > output.put(oKey, null); > > } > > } > > > > If anyone has any insight how I could get this to work, that'd really > > help me out. > > > > Thanks, > > Kris > > > > P.S. For those who remember my earlier post about getting httpArgParse > > to compile, I took the advice to ditch the InternalMap in favour of a > > HashMap<String,Object> > > > > -- > > Kris Coward http://unripe.melon.org/ > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3 > >
-- Kris Coward http://unripe.melon.org/ GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3
