Just edited it. Hopefully it is explanatory enough On Thu, Jun 12, 2008 at 2:24 PM, Thilo Goetz <[EMAIL PROTECTED]> wrote:
> Hi Rohan, > > good question. I added a page under "developer tips" I > suggest you use: > http://cwiki.apache.org/confluence/display/UIMA/Running+UIMA+Apps+on+Hadoop > > --Thilo > > > rohan rai wrote: > >> Hi Thilo >> >> Sorry for asking such a simple thing ...Under which topic should I add >> this >> info >> >> Regards >> Rohan >> >> On Thu, Jun 12, 2008 at 2:21 AM, Thilo Goetz <[EMAIL PROTECTED]> wrote: >> >> Hi Rohan, >>> >>> I'm glad you got it to work. This is useful information. It would >>> be great if you could put it up on the UIMA Wiki: >>> http://cwiki.apache.org/UIMA/ >>> >>> --Thilo >>> >>> >>> rohan rai wrote: >>> >>> I think I got it.....Thanks for all the help you guys.........To make a >>>> simple UIMA app work over hadoop (I did it on pseudo distributed >>>> environment) 3-4 factors come together.. >>>> >>>> 1) the UIMA app along with the mapper reducer and your job main file + >>>> the >>>> the resources should be contained within the job jar you created >>>> >>>> 2) probably all import in the descriptor should be import by name >>>> (haven't >>>> verified this works with location) >>>> >>>> 3) any resource being read in any of the class file should be done via >>>> Classloader >>>> E.g XMLInputSource in = new >>>> >>>> >>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null); >>>> >>>> 4) the When any AnalysisEngine or something like that of UIMA is being >>>> getting produced (I am doing it in mapper) then ResourceManager should >>>> be >>>> used >>>> E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager(); >>>> rMng.setExtensionClassPath(str, true); //Here str is the >>>> path to any of the resources which can be obtained via >>>> >>>> //ClassLoader.getSystemResource(aeXmlDescriptor).getPath() >>>> rMng.setDataPath(str); >>>> aEngine = >>>> UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null); >>>> >>>> This 4th point has to be considered as when we read a xml without using >>>> classloader by default it reads from temp task directory eg. >>>> >>>> >>>> >>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/ >>>> >>>> But all the resources and classes gets unjarred in >>>> >>>> >>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work >>>> >>>> directory >>>> >>>> So to tell the system to look out for the resources in the correct >>>> directory when not using classloader (which is what UIMA's >>>> XMLInputSource does) >>>> we have to use resource manager >>>> >>>> Regards >>>> Rohan >>>> >>>> ... >>> >>> >>> >>
