Also to add, the default serialization libraries supported are specified in core-default,xml as
<property> <name>io.serializations</name> <value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value> <description>A list of serialization classes that can be used for obtaining serializers and deserializers.</description> </property> Since the default Java Serialization isn't supported , you would need to convert to *Writables that Hadoop can use for better , compact serialization of objects. Regards Ravi Magham On Tue, Aug 27, 2013 at 9:27 PM, Shahab Yunus <[email protected]>wrote: > As far as I undersstand, StringTokenizer.nextToken returns Java String > type object which does not implement the required Writable and Comparable > interfaces needed to Hadoop Mapreduce serialization and transport. The Text > class does that and is compatible and thus that is why that is being used > to wrap Java String and pass it on. > > Regards, > Shahab > > > On Tue, Aug 27, 2013 at 11:16 AM, Andrew Pennebaker <[email protected] > > wrote: > >> In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code, >> line 16 declares: >> >> private Text word = new Text(); >> >> ... >> >> But only lines 22 and 23 use this, and only to pass the value along to >> output: >> >> word.set(tokenizer.nextToken()); >> output.collect(word, one); >> >> Wouldn't this be better expressed as: >> >> (no private Text word) >> >> ... >> >> output.collect(tokenizer.nextToken(), one); >> >> ? >> > >
