Hi,
I've recently gotten stumped by a problem where my attempts to dump the
relations produced by a GROUP command give the following error (though
illustrating the same relation works fine):
java.io.IOException: Type mismatch in key from map: expected
org.apache.pig.impl.io.NullableBytesWritable, recieved
org.apache.pig.impl.io.NullableText
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
at
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
.
.
.
for a little background, the relation that's failing is called y5, and
is produced by the following string of commands (in grunt):
y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as argMap;
y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp;
y4 = FILTER y3 BY (uid is not null);
y5 = GROUP y4 BY uid;
and to get an idea what sort of data is involved, ILLUSTRATE y4 yields:
-----------------------------------------------------------------------------------------------------
| y1 | timestamp: int | args: bag({tuple_of_tokens: (token: chararray)})
|
-----------------------------------------------------------------------------------------------------
| | 1265950806 | {(s=1381688313),
(u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} |
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
| y2 | timestamp: int | argMap: map
|
-----------------------------------------------------------------------------------------------
| | 1265950806 | {u=F68FFA1F655FDF494ABA520D95E1D99E, ts=1265950805,
s=1381688313} |
-----------------------------------------------------------------------------------------------
--------------------------------------------
| y3 | uid: bytearray | timestamp: int |
--------------------------------------------
| | 1381688313 | 1265950806 |
--------------------------------------------
--------------------------------------------
| y4 | uid: bytearray | timestamp: int |
--------------------------------------------
| | 1381688313 | 1265950806 |
--------------------------------------------
The same problem was also produced when the FILTER command was omitted,
and the relevant chunk of code in myudfs.httpArgParse is:
StringTokenizer tok = new StringTokenizer((String)pair, "=", false);
if (tok.hasMoreTokens() ) {
String oKey = tok.nextToken();
if (tok.hasMoreTokens() ) {
Object oValue = tok.nextToken();
output.put(oKey, oValue);
} else {
output.put(oKey, null);
}
}
If anyone has any insight how I could get this to work, that'd really
help me out.
Thanks,
Kris
P.S. For those who remember my earlier post about getting httpArgParse
to compile, I took the advice to ditch the InternalMap in favour of a
HashMap<String,Object>
--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3