rohan rai wrote:
Just edited it. Hopefully it is explanatory enough
That's great, thanks Rohan.
On Thu, Jun 12, 2008 at 2:24 PM, Thilo Goetz <[EMAIL PROTECTED]> wrote:
Hi Rohan,
good question. I added a page under "developer tips" I
suggest you use:
http://cwiki.apache.org/confluence/display/UIMA/Running+UIMA+Apps+on+Hadoop
--Thilo
rohan rai wrote:
Hi Thilo
Sorry for asking such a simple thing ...Under which topic should I add
this
info
Regards
Rohan
On Thu, Jun 12, 2008 at 2:21 AM, Thilo Goetz <[EMAIL PROTECTED]> wrote:
Hi Rohan,
I'm glad you got it to work. This is useful information. It would
be great if you could put it up on the UIMA Wiki:
http://cwiki.apache.org/UIMA/
--Thilo
rohan rai wrote:
I think I got it.....Thanks for all the help you guys.........To make a
simple UIMA app work over hadoop (I did it on pseudo distributed
environment) 3-4 factors come together..
1) the UIMA app along with the mapper reducer and your job main file +
the
the resources should be contained within the job jar you created
2) probably all import in the descriptor should be import by name
(haven't
verified this works with location)
3) any resource being read in any of the class file should be done via
Classloader
E.g XMLInputSource in = new
XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
4) the When any AnalysisEngine or something like that of UIMA is being
getting produced (I am doing it in mapper) then ResourceManager should
be
used
E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
rMng.setExtensionClassPath(str, true); //Here str is the
path to any of the resources which can be obtained via
//ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
rMng.setDataPath(str);
aEngine =
UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);
This 4th point has to be considered as when we read a xml without using
classloader by default it reads from temp task directory eg.
/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/
But all the resources and classes gets unjarred in
/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work
directory
So to tell the system to look out for the resources in the correct
directory when not using classloader (which is what UIMA's
XMLInputSource does)
we have to use resource manager
Regards
Rohan
...