Hi, I did that but still same exception error. I did: export HADOOP_CLASSPATH=/path/to/external.jar And then had a -libjars /path/to/external.jar added in my command but still same error
On Thu, May 30, 2013 at 11:46 AM, Shahab Yunus <[email protected]>wrote: > For starters, you can specify them through the -libjars parameter when you > kick off your M/R job. This way the jars will be copied to all TTs. > > Regards, > Shahab > > > On Thu, May 30, 2013 at 2:43 PM, jamal sasha <[email protected]>wrote: > >> Hi Thanks guys. >> I figured out the issue. Hence i have another question. >> I am using a third party library and I thought that once I have created >> the jar file I dont need to specify the dependancies but aparently thats >> not the case. (error below) >> Very very naive question...probably stupid. How do i specify third party >> libraries (jar) in hadoop. >> >> Error: >> Error: java.lang.ClassNotFoundException: org.json.JSONException >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:247) >> at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) >> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865) >> at >> org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) >> at org.apache.hadoop.mapred.Child.main(Child.java:249) >> >> >> >> On Thu, May 30, 2013 at 2:02 AM, Pramod N <[email protected]> wrote: >> >>> Whatever you are trying to do should work, >>> Here is the modified WordCount Map >>> >>> >>> public void map(LongWritable key, Text value, Context context) throws >>> IOException, InterruptedException { String line = value.toString(); >>> >>> JSONObject line_as_json = new JSONObject(line); >>> String text = line_as_json.getString("text"); >>> StringTokenizer tokenizer = new StringTokenizer(text); while >>> (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); >>> context.write(word, one); } } >>> >>> >>> >>> >>> >>> Pramod N <http://atmachinelearner.blogspot.in> >>> Bruce Wayne of web >>> @machinelearner <https://twitter.com/machinelearner> >>> >>> -- >>> >>> >>> On Thu, May 30, 2013 at 8:42 AM, Rahul Bhattacharjee < >>> [email protected]> wrote: >>> >>>> Whatever you have mentioned Jamal should work.you can debug this. >>>> >>>> Thanks, >>>> Rahul >>>> >>>> >>>> On Thu, May 30, 2013 at 5:14 AM, jamal sasha <[email protected]>wrote: >>>> >>>>> Hi, >>>>> For some reason, this have to be in java :( >>>>> I am trying to use org.json library, something like (in mapper) >>>>> JSONObject jsn = new JSONObject(value.toString()); >>>>> >>>>> String text = (String) jsn.get("text"); >>>>> StringTokenizer itr = new StringTokenizer(text); >>>>> >>>>> But its not working :( >>>>> It would be better to get this thing properly but I wouldnt mind using >>>>> a hack as well :) >>>>> >>>>> >>>>> On Wed, May 29, 2013 at 4:30 PM, Michael Segel < >>>>> [email protected]> wrote: >>>>> >>>>>> Yeah, >>>>>> I have to agree w Russell. Pig is definitely the way to go on this. >>>>>> >>>>>> If you want to do it as a Java program you will have to do some work >>>>>> on the input string but it too should be trivial. >>>>>> How formal do you want to go? >>>>>> Do you want to strip it down or just find the quote after the text >>>>>> part? >>>>>> >>>>>> >>>>>> On May 29, 2013, at 5:13 PM, Russell Jurney <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Seriously consider Pig (free answer, 4 LOC): >>>>>> >>>>>> my_data = LOAD 'my_data.json' USING >>>>>> com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[]; >>>>>> words = FOREACH my_data GENERATE $0#'author' as author, >>>>>> FLATTEN(TOKENIZE($0#'text')) as word; >>>>>> word_counts = FOREACH (GROUP words BY word) GENERATE group AS word, >>>>>> COUNT_STAR(words) AS word_count; >>>>>> STORE word_counts INTO '/tmp/word_counts.txt'; >>>>>> >>>>>> It will be faster than the Java you'll likely write. >>>>>> >>>>>> >>>>>> On Wed, May 29, 2013 at 2:54 PM, jamal sasha >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> Hi, >>>>>>> I am stuck again. :( >>>>>>> My input data is in hdfs. I am again trying to do wordcount but >>>>>>> there is slight difference. >>>>>>> The data is in json format. >>>>>>> So each line of data is: >>>>>>> >>>>>>> {"author":"foo", "text": "hello"} >>>>>>> {"author":"foo123", "text": "hello world"} >>>>>>> {"author":"foo234", "text": "hello this world"} >>>>>>> >>>>>>> So I want to do wordcount for text part. >>>>>>> I understand that in mapper, I just have to pass this data as json >>>>>>> and extract "text" and rest of the code is just the same but I am >>>>>>> trying to >>>>>>> switch from python to java hadoop. >>>>>>> How do I do this. >>>>>>> Thanks >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Russell Jurney twitter.com/rjurney [email protected] >>>>>> datasyndrome.com >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
