That's most likely because the XML isn't valid :-)
Seriously, the "no content allowed in prolog" message
is sometimes due to an incorrect text encoding.

Does this run ok locally?

--Thilo

rohan rai wrote:
Thanks Thilo. Well If do that all sorts of invalid xml exception is getting
thrown

org.apache.uima.util.InvalidXMLException: Invalid descriptor at
<unknown source>.
        at 
org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
        at 
org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
        at 
org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
        at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
        at 
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
        at 
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
        at 
org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)
        ... 8 more
org.apache.uima.util.InvalidXMLException: Invalid descriptor at
<unknown source>.
        at 
org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
        at 
org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
        at 
org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
        at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
        at 
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
        at 
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
        at 
org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)



On Wed, Jun 11, 2008 at 6:08 PM, Thilo Goetz <[EMAIL PROTECTED]> wrote:

You need to use import by name instead of import
by location in your descriptor.  Then things get
loaded via the classpath and you should be ok
(provided that you stick your descriptors in the
jar of course).  I suggest you test this locally
first by moving your application to a different
machine where you don't have any descriptors
lying around.  It'll be easier to debug than in
hadoop.

--Thilo


rohan rai wrote:

Well the question is for running UIMA over hadoop? How to do that as in
UIMA
there are xml descriptors which have relative urls and location? Which
throws exception

But I can probably do without that answer

Simplifying the problem

I create a jar for my application and I am trying to run a map reduce job

In the map I am trying to read an xml resource which gives this kind of
exceprion

java.io.FileNotFoundException:

/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806102252_0028/task_200806102252_0028_m_000000_0/./descriptors/annotators/RecordCandidateAnnotator.xml
(No such file or directory)
       at java.io.FileInputStream.open(Native Method)
       at java.io.FileInputStream.<init>(FileInputStream.java:106)
       at java.io.FileInputStream.<init>(FileInputStream.java:66)
       at
sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
       at
sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
       at java.net.URL.openStream(URL.java:1009)
       at
org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:83)

I think I require to pass on the content of the jar which contains the
resource xml and classes(other than the JOB class) to each and every
taskXXXXXXX getting created

How can I do that

REgards
Rohan




On Wed, Jun 11, 2008 at 5:12 PM, Michael Baessler <
[EMAIL PROTECTED]>
wrote:

 rohan rai wrote:
Hi
 A simple thing such as a name annotator which has an import location of
type starts throwing exception when I create a jar of the application I

am

developing and run over hadoop.

If I have to do it a java class file then I can use XMLInputSource in =

new

XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);

But the relative paths in annotators, analysis engines etc starts

throwing

exception

Please Help

Regards
Rohan

 I'm not sure I understand your question, but I think you need some help
with the exceptions you get.
Can you provide the exception stack trace?

-- Michael



Reply via email to