I don't think JSONObject implements the necessary interface that is required for a class/type needs to be used as Key in Map/Reduce library. WritableComparable is the one, I think.
Regards, Shahab On Tue, Jun 4, 2013 at 6:49 PM, Max Lebedev <[email protected]> wrote: > Hi. I've been trying to use JSONObjects to identify duplicates in > JSONStrings. > The duplicate strings contain the same data, but not necessarily in the > same order. For example the following two lines should be identified as > duplicates (and filtered). > > > {"ts":1368758947.291035,"isSecure":true,"version":2,"source":"sdk","debug":false > > {"ts":1368758947.291035,"version":2,"source":"sdk","isSecure":true,"debug":false} > > This is the code: > > class DupFilter{ > > public static class Map extends MapReduceBase implements > Mapper<LongWritable, Text, JSONObject, Text> { > > public void map(LongWritable key, Text value, > OutputCollector<JSONObject, Text> output, Reporter reporter) throws > IOException{ > > JSONObject jo = null; > > try { > > jo = new JSONObject(value.toString()); > > } catch (JSONException e) { > > e.printStackTrace(); > > } > > output.collect(jo, value); > > } > > } > > public static class Reduce extends MapReduceBase implements > Reducer<JSONObject, Text, NullWritable, Text> { > > public void reduce(JSONObject jo, Iterator<Text> lines, > OutputCollector<NullWritable, Text> output, Reporter reporter) throws > IOException { > > output.collect(null, lines.next()); > > } > > } > > public static void main(String[] args) throws Exception { > > JobConf conf = new JobConf(DupFilter.class); > > conf.setOutputKeyClass(JSONObject.class); > > conf.setOutputValueClass(Text.class); > > conf.setMapperClass(Map.class); > > conf.setReducerClass(Reduce.class); > > conf.setInputFormat(TextInputFormat.class); > > conf.setOutputFormat(TextOutputFormat.class); > > FileInputFormat.setInputPaths(conf, new Path(args[0])); > > FileOutputFormat.setOutputPath(conf, new Path(args[1])); > > JobClient.runJob(conf); > > } > > } > > I get the following error: > > > java.lang.ClassCastException: class org.json.JSONObject > > at java.lang.Class.asSubclass(Class.java:3027) > > at > org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:795) > > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:817) > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:383) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) > > > > It looks like it has something to do with conf.setOutputKeyClass(). Am I > doing something wrong here? > > > Thanks, > > Max Lebedev >
