I see you mentioned it is a record (in a line) so would be fine n it has other text data too in that record. On May 3, 2015 1:45 AM, "Sandeep Khurana" <[email protected]> wrote:
> This code won't work if the json spans more than one line in the input > files. > On May 3, 2015 1:41 AM, "Shambhavi Punja" <[email protected]> wrote: > >> Hi Shahab, >> >> Thanks. That helped. >> >> Regards, >> Shambhavi >> >> On Thu, Apr 30, 2015 at 10:18 AM, Shahab Yunus <[email protected]> >> wrote: >> >>> The reason is that the Json parsing code is in a 3rd party library which >>> is not included in the default map reduce/hadoop distribution. You have to >>> add them in your classpath at *runtime*. There are multiple ways to do >>> it (which also depends upon how you plan to run and package/deploy your >>> code.) >>> >>> Check out this: >>> >>> https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/ >>> >>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ >>> >>> Regards, >>> Shahab >>> >>> On Thu, Apr 30, 2015 at 1:01 PM, Shambhavi Punja <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I am working on an assignment on Hadoop Map reduce. I am very new to >>>> Map Reduce. >>>> >>>> The assignment has many sections but for now I am trying to parse JSON >>>> data. >>>> >>>> The input(i.e. value) to the map function is a single record of the >>>> form xyz, {'abc’:’pqr1’,'abc2’:'pq1, pq2’}, {‘key’:'value1’} >>>> I am interested only in the getting the frequency of value1. >>>> >>>> Following is the map- reduce job. >>>> >>>> public static class Map extends MapReduceBase implements >>>> Mapper<LongWritable, Text, Text, IntWritable> { >>>> private final static IntWritable one = new IntWritable(1 >>>> ); >>>> private Text word = new Text(); >>>> >>>> >>>> public void map(LongWritable key, Text value, >>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws >>>> IOException { >>>> String line = value.toString(); >>>> String[] tuple = line.split("(?<=\\}),\\s"); >>>> try{ >>>> JSONObject obj = new JSONObject(tuple[1]); >>>> String id = obj.getString(“key"); >>>> word.set(id); >>>> output.collect(word, one); >>>> } >>>> catch(JSONException e){ >>>> e.printStackTrace(); >>>> } >>>> } >>>> } >>>> >>>> >>>> >>>> >>>> public static class Reduce extends MapReduceBase implements >>>> Reducer<Text, IntWritable, Text, IntWritable> { >>>> public void reduce(Text key, Iterator<IntWritable> >>>> values, OutputCollector<Text, IntWritable> output, Reporter reporter) >>>> throws IOException { >>>> int sum = 0; >>>> while (values.hasNext()) { >>>> sum += values.next().get(); >>>> } >>>> output.collect(key, new IntWritable(sum)); >>>> } >>>> } >>>> >>>> I successfully compiled the java code using the json and hadoop jars. >>>> Created a jar. But wen I run the Hadoop command I am getting the following >>>> exceptions. >>>> >>>> >>>> 15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load >>>> native-hadoop library for your platform... using builtin-java classes where >>>> applicable >>>> 15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for >>>> parsing the arguments. Applications should implement Tool for the same. >>>> 15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not >>>> loaded >>>> 15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to >>>> process : 1 >>>> 15/04/30 00:36:49 INFO mapred.JobClient: Running job: >>>> job_local1121514690_0001 >>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks >>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task: >>>> attempt_local1121514690_0001_m_000000_0 >>>> 15/04/30 00:36:49 INFO mapred.Task: Using ResourceCalculatorPlugin : >>>> null >>>> 15/04/30 00:36:49 INFO mapred.MapTask: Processing split: >>>> file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305 >>>> 15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1 >>>> 15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100 >>>> 15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720 >>>> 15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680 >>>> 15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor >>>> complete. >>>> 15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001 >>>> java.lang.Exception: java.lang.RuntimeException: Error in configuring >>>> object >>>> at >>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) >>>> Caused by: java.lang.RuntimeException: Error in configuring object >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) >>>> at >>>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>> at java.lang.Thread.run(Thread.java:745) >>>> Caused by: java.lang.reflect.InvocationTargetException >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:483) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >>>> ... 10 more >>>> Caused by: java.lang.NoClassDefFoundError: org/json/JSONException >>>> at java.lang.Class.forName0(Native Method) >>>> at java.lang.Class.forName(Class.java:344) >>>> at >>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810) >>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855) >>>> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881) >>>> at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968) >>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) >>>> ... 15 more >>>> Caused by: java.lang.ClassNotFoundException: org.json.JSONException >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>> ... 22 more >>>> 15/04/30 00:36:50 INFO mapred.JobClient: map 0% reduce 0% >>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job complete: >>>> job_local1121514690_0001 >>>> 15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0 >>>> 15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA >>>> Exception in thread "main" java.io.IOException: Job failed! >>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) >>>> at org.myorg.Wordcount.main(Wordcount.java:64) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:483) >>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:160) >>>> >>>> >>>> PS: When I modify the same code and exclude the JSON parsing i.e. find >>>> frequency of {‘key’:’value1’} section of the example input, all works well. >>>> >>>> >>> >>
