Re: Reading json format input

Shahab Yunus Thu, 30 May 2013 11:47:26 -0700

For starters, you can specify them through the -libjars parameter when you
kick off your M/R job. This way the jars will be copied to all TTs.


Regards,
Shahab


On Thu, May 30, 2013 at 2:43 PM, jamal sasha <[email protected]> wrote:

> Hi Thanks guys.
>  I figured out the issue. Hence i have another question.
> I am using a third party library and I thought that once I have created
> the jar file I dont need to specify the dependancies but aparently thats
> not the case. (error below)
> Very very naive question...probably stupid. How do i specify third party
> libraries (jar) in hadoop.
>
> Error:
> Error: java.lang.ClassNotFoundException: org.json.JSONException
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>  at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865)
> at
> org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>  at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
>
> On Thu, May 30, 2013 at 2:02 AM, Pramod N <[email protected]> wrote:
>
>> Whatever you are trying to do should work,
>> Here is the modified WordCount Map
>>
>>
>>     public void map(LongWritable key, Text value, Context context) throws 
>> IOException, InterruptedException {        String line = value.toString();
>>
>>         JSONObject line_as_json = new JSONObject(line);
>>         String text = line_as_json.getString("text");
>>         StringTokenizer tokenizer = new StringTokenizer(text);        while 
>> (tokenizer.hasMoreTokens()) {            word.set(tokenizer.nextToken());    
>>         context.write(word, one);        }    }
>>
>>
>>
>>
>>
>> Pramod N <http://atmachinelearner.blogspot.in>
>> Bruce Wayne of web
>> @machinelearner <https://twitter.com/machinelearner>
>>
>> --
>>
>>
>> On Thu, May 30, 2013 at 8:42 AM, Rahul Bhattacharjee <
>> [email protected]> wrote:
>>
>>> Whatever you have mentioned Jamal should work.you can debug this.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Thu, May 30, 2013 at 5:14 AM, jamal sasha <[email protected]>wrote:
>>>
>>>> Hi,
>>>>   For some reason, this have to be in java :(
>>>> I am trying to use org.json library, something like (in mapper)
>>>> JSONObject jsn = new JSONObject(value.toString());
>>>>
>>>> String text = (String) jsn.get("text");
>>>> StringTokenizer itr = new StringTokenizer(text);
>>>>
>>>> But its not working :(
>>>> It would be better to get this thing properly but I wouldnt mind using
>>>> a hack as well :)
>>>>
>>>>
>>>> On Wed, May 29, 2013 at 4:30 PM, Michael Segel <
>>>> [email protected]> wrote:
>>>>
>>>>> Yeah,
>>>>> I have to agree w Russell. Pig is definitely the way to go on this.
>>>>>
>>>>> If you want to do it as a Java program you will have to do some work
>>>>> on the input string but it too should be trivial.
>>>>> How formal do you want to go?
>>>>> Do you want to strip it down or just find the quote after the text
>>>>> part?
>>>>>
>>>>>
>>>>> On May 29, 2013, at 5:13 PM, Russell Jurney <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Seriously consider Pig (free answer, 4 LOC):
>>>>>
>>>>> my_data = LOAD 'my_data.json' USING
>>>>> com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[];
>>>>> words = FOREACH my_data GENERATE $0#'author' as author,
>>>>> FLATTEN(TOKENIZE($0#'text')) as word;
>>>>> word_counts = FOREACH (GROUP words BY word) GENERATE group AS word,
>>>>> COUNT_STAR(words) AS word_count;
>>>>> STORE word_counts INTO '/tmp/word_counts.txt';
>>>>>
>>>>> It will be faster than the Java you'll likely write.
>>>>>
>>>>>
>>>>> On Wed, May 29, 2013 at 2:54 PM, jamal sasha <[email protected]>wrote:
>>>>>
>>>>>> Hi,
>>>>>>    I am stuck again. :(
>>>>>> My input data is in hdfs. I am again trying to do wordcount but there
>>>>>> is slight difference.
>>>>>> The data is in json format.
>>>>>> So each line of data is:
>>>>>>
>>>>>> {"author":"foo", "text": "hello"}
>>>>>> {"author":"foo123", "text": "hello world"}
>>>>>> {"author":"foo234", "text": "hello this world"}
>>>>>>
>>>>>> So I want to do wordcount for text part.
>>>>>> I understand that in mapper, I just have to pass this data as json
>>>>>> and extract "text" and rest of the code is just the same but I am trying 
>>>>>> to
>>>>>> switch from python to java hadoop.
>>>>>> How do I do this.
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Russell Jurney twitter.com/rjurney [email protected]
>>>>> datasyndrome.com
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Reading json format input

Reply via email to