Re: import location over Hadoop

rohan rai Thu, 12 Jun 2008 06:25:03 -0700

Just edited it. Hopefully it is explanatory enough

On Thu, Jun 12, 2008 at 2:24 PM, Thilo Goetz <[EMAIL PROTECTED]> wrote:


> Hi Rohan,
>
> good question.  I added a page under "developer tips" I
> suggest you use:
> http://cwiki.apache.org/confluence/display/UIMA/Running+UIMA+Apps+on+Hadoop
>
> --Thilo
>
>
> rohan rai wrote:
>
>> Hi Thilo
>>
>> Sorry for asking such a simple thing ...Under which topic should I add
>> this
>> info
>>
>> Regards
>> Rohan
>>
>> On Thu, Jun 12, 2008 at 2:21 AM, Thilo Goetz <[EMAIL PROTECTED]> wrote:
>>
>>  Hi Rohan,
>>>
>>> I'm glad you got it to work.  This is useful information.  It would
>>> be great if you could put it up on the UIMA Wiki:
>>> http://cwiki.apache.org/UIMA/
>>>
>>> --Thilo
>>>
>>>
>>> rohan rai wrote:
>>>
>>>  I think I got it.....Thanks for all the help you guys.........To make a
>>>> simple UIMA app work over hadoop (I did it on pseudo distributed
>>>> environment) 3-4 factors come together..
>>>>
>>>> 1) the UIMA app along with the mapper reducer and your job main file +
>>>> the
>>>> the resources should be contained within the job jar you created
>>>>
>>>> 2) probably all import in the descriptor should be import by name
>>>> (haven't
>>>> verified this works with location)
>>>>
>>>> 3) any resource being read in any of the class file should be done via
>>>> Classloader
>>>>  E.g XMLInputSource in = new
>>>>
>>>>
>>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>>
>>>> 4) the When any AnalysisEngine or something like that of UIMA  is being
>>>> getting produced (I am doing it in mapper) then ResourceManager should
>>>> be
>>>> used
>>>>  E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
>>>>               rMng.setExtensionClassPath(str, true); //Here str is the
>>>> path to any of the resources which can be obtained via
>>>>
>>>> //ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
>>>>               rMng.setDataPath(str);
>>>>               aEngine =
>>>> UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);
>>>>
>>>> This 4th point has to be considered as when we read a xml without using
>>>> classloader by default it reads from temp task directory eg.
>>>>
>>>>
>>>>
>>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/
>>>>
>>>> But all the resources and classes gets unjarred in
>>>>
>>>>
>>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work
>>>>
>>>> directory
>>>>
>>>> So to tell the system to look out for the resources in the correct
>>>> directory when not using classloader (which is what UIMA's
>>>> XMLInputSource does)
>>>> we have to use resource manager
>>>>
>>>> Regards
>>>> Rohan
>>>>
>>>>  ...
>>>
>>>
>>>
>>

Re: import location over Hadoop

Reply via email to