Re: import location over Hadoop

Thilo Goetz Thu, 12 Jun 2008 07:15:28 -0700

rohan rai wrote:

Just edited it. Hopefully it is explanatory enough


That's great, thanks Rohan.


On Thu, Jun 12, 2008 at 2:24 PM, Thilo Goetz <[EMAIL PROTECTED]> wrote:

Hi Rohan,

good question.  I added a page under "developer tips" I
suggest you use:
http://cwiki.apache.org/confluence/display/UIMA/Running+UIMA+Apps+on+Hadoop

--Thilo


rohan rai wrote:

Hi Thilo

Sorry for asking such a simple thing ...Under which topic should I add
this
info

Regards
Rohan

On Thu, Jun 12, 2008 at 2:21 AM, Thilo Goetz <[EMAIL PROTECTED]> wrote:

 Hi Rohan,

I'm glad you got it to work.  This is useful information.  It would
be great if you could put it up on the UIMA Wiki:
http://cwiki.apache.org/UIMA/

--Thilo


rohan rai wrote:

 I think I got it.....Thanks for all the help you guys.........To make a

simple UIMA app work over hadoop (I did it on pseudo distributed
environment) 3-4 factors come together..

1) the UIMA app along with the mapper reducer and your job main file +
the
the resources should be contained within the job jar you created

2) probably all import in the descriptor should be import by name
(haven't
verified this works with location)

3) any resource being read in any of the class file should be done via
Classloader
 E.g XMLInputSource in = new


XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);

4) the When any AnalysisEngine or something like that of UIMA  is being
getting produced (I am doing it in mapper) then ResourceManager should
be
used
 E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
              rMng.setExtensionClassPath(str, true); //Here str is the
path to any of the resources which can be obtained via

//ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
              rMng.setDataPath(str);
              aEngine =
UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);

This 4th point has to be considered as when we read a xml without using
classloader by default it reads from temp task directory eg.



/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/

But all the resources and classes gets unjarred in


/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work

directory

So to tell the system to look out for the resources in the correct
directory when not using classloader (which is what UIMA's
XMLInputSource does)
we have to use resource manager

Regards
Rohan

 ...

Re: import location over Hadoop

Reply via email to