Ruta - Text Ruler - NullPointerException with the example project

2014-02-12 Thread Nicolas Hernandez
Hi everyone,

I'm testing the TextRuler framework to induce annotation rules. In
particular I follow the example in the documentation [1] which should works
on the example project present in the svn repository [2].

Unfortunately, when I press the start button, the view freezes on
"MethodPreprocessing..." and that's all. In the console, when I launch
Eclipse in command line I can read the following trace exception [3].

The same for all the learning algorithms.
I use Eclipse Kepler. There is no eclipse update available for the plugins.
The java version is "1.7.0_51" OpenJDK.

If anyone (Peter ;-) has a clue ?

/Nicolas

[1]
https://uima.apache.org/d/ruta-current/tools.ruta.book.html#section.tools.ruta.workbench.textruler.example

[2]
https://svn.apache.org/repos/asf/uima/ruta/trunk/example-projects/TextRulerExample/

[3] Trace
$ eclipse
uima.ruta.example.Author
uima.ruta.example.Date
uima.ruta.example.Pages
uima.ruta.example.Publisher
uima.ruta.example.Institution
uima.ruta.example.Volume
uima.ruta.example.Editor
uima.ruta.example.Title
uima.ruta.example.Booktitle
uima.ruta.example.Note
uima.ruta.example.Journal
uima.ruta.example.Location
uima.ruta.example.Tech
Exception in thread "Thread-7" java.lang.NullPointerException
at
org.apache.uima.ruta.textruler.core.TextRulerToolkit.addBoundaryTypes(TextRulerToolkit.java:154)
 at
org.apache.uima.ruta.textruler.extension.TextRulerPreprocessor.run(TextRulerPreprocessor.java:58)
at
org.apache.uima.ruta.textruler.extension.TextRulerPreprocessor.run(TextRulerPreprocessor.java:44)
 at
org.apache.uima.ruta.textruler.extension.TextRulerController$1.run(TextRulerController.java:174)
at java.lang.Thread.run(Thread.java:744)


Re: Error deploying pear on AS 2.4.2

2014-02-12 Thread Bai Shen
That would explain why it's not working. :)  What command should I use to
deploy my pear?

The documentation talks about merging, packaging, and installing the pear,
but I can't find any mention about deploying it.

Is there a specific section that I should be looking at?

Thanks.


On Wed, Feb 12, 2014 at 3:09 PM, Eddie Epstein  wrote:

> A pear is a packed UIMA analysis engine, or AE. UIMA-AS deploys services
> that contain AEs. The command deployAsyncService requires a UIMA-AS
> Deployment Descriptor.
>
>
> On Wed, Feb 12, 2014 at 10:52 AM, Bai Shen 
> wrote:
>
> > I'm running the following command to deploy my pear to UIMA-AS 2.4.2.
> >
> > deployAsyncService.sh test_pear.xml -brokerURL tcp://uima-broker:61616
> >
> > I'm getting the following error.
> >
> > SEPM0004: When 'standalone' or 'doctype-system' is specified, the
> document
> > must be
> >   well-formed; but this document contains a top-level text node
> >
> > It's a saxon error, but I can't tell what causing it.  Any suggestions
> for
> > where to look?
> >
> > Thanks.
> >
>


Re: Error deploying pear on AS 2.4.2

2014-02-12 Thread Eddie Epstein
A pear is a packed UIMA analysis engine, or AE. UIMA-AS deploys services
that contain AEs. The command deployAsyncService requires a UIMA-AS
Deployment Descriptor.


On Wed, Feb 12, 2014 at 10:52 AM, Bai Shen  wrote:

> I'm running the following command to deploy my pear to UIMA-AS 2.4.2.
>
> deployAsyncService.sh test_pear.xml -brokerURL tcp://uima-broker:61616
>
> I'm getting the following error.
>
> SEPM0004: When 'standalone' or 'doctype-system' is specified, the document
> must be
>   well-formed; but this document contains a top-level text node
>
> It's a saxon error, but I can't tell what causing it.  Any suggestions for
> where to look?
>
> Thanks.
>


Re: Unable to use ConceptMapper annotator

2014-02-12 Thread Peter Litsegård
Richard Eckart de Castilho  writes:

> 
> On 12.02.2014, at 11:22, Peter Litsegård  wrote:
> 
> > Why would the
> > ConceptMapper want to use these as the types declared on those xmls have
> > already been "Cas generated" and their .class files are present in the
CM-jar?
> 
> The generated JCas classes are just a way of mapping the UIMA type system
to the
> Java type system. They offer a convenience for programming using the known
> class/getter/setter concepts in Java.
> 
> These classes are not a substitute for the XML-based type system definitions.
> The type system definitions are always required in addition to the JCas
classes.
> 
> When you use only the JCas classes, but did not initialize the CAS with
the proper
> types, you'll get such an error message:
> 
> "JCas type used in Java code, but was not declared in the XML type descriptor"
> 
> Cheers,
> 

Hi!

I've had some progress on this - no exceptions that is:) I don't get any
hits however when I use the ConceptMapper. I've set PrintDictionary = true
which shows that it successfully loads 49 dictentries and no exceptions
while executing the code below. I use the following code:

XMLInputSource in = new XMLInputSource(".../ConceptMapperOffsetTokenizer.xml");
ResourceSpecifier specifier =
UIMAFramework.getXMLParser().parseResourceSpecifier(in);
AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier);

JCas jcas = ae.newJCas();
jcas.setDocumentText("...some text containing a number of dictentries");
ae.process(jcas);

Now, how do I loop through the hits in the jcas-instance (the dictionary 
entry together with begin/end positions, SemClass which is part of the 
dictionary entry attributes etc.)?

I'm very sorry for posting these trivial questions but...



Error deploying pear on AS 2.4.2

2014-02-12 Thread Bai Shen
I'm running the following command to deploy my pear to UIMA-AS 2.4.2.

deployAsyncService.sh test_pear.xml -brokerURL tcp://uima-broker:61616

I'm getting the following error.

SEPM0004: When 'standalone' or 'doctype-system' is specified, the document
must be
  well-formed; but this document contains a top-level text node

It's a saxon error, but I can't tell what causing it.  Any suggestions for
where to look?

Thanks.


Re: uima-as 2.3.1 - java.io.IOException: Frame size of 147 MB larger than max allowed 100 MB

2014-02-12 Thread Jaroslaw Cwiklik
It seems like the ActimeMQ documentation (
http://activemq.apache.org/configuring-wire-formats.html)
is wrong with respect to the default maxFrameSize being MAX_LONG. I checked
ActiveMQ source code and the default is 100 MB:

public final class OpenWireFormat implements WireFormat {public
static final int DEFAULT_VERSION =
CommandTypes.PROTOCOL_STORE_VERSION;public static final int
DEFAULT_WIRE_VERSION = CommandTypes.PROTOCOL_VERSION;public static
final int *DEFAULT_MAX_FRAME_SIZE* = 100 * 1024 * 1024; //100 MB
<-static final byte NULL_TYPE
= CommandTypes.NULL;private static final int MARSHAL_CACHE_SIZE =
Short.MAX_VALUE / 2;private static final int
MARSHAL_CACHE_FREE_SPACE = 100;


The UIMA-AS doesnt set this value so the default is being used unless
overriden. It seems to me that
either your service or a client is not overriding the default. Please check
your deployment descriptors to make sure
that you changing the default in the brokerURL.

Jerry


On Wed, Feb 12, 2014 at 9:21 AM, Mihaela M  wrote:

> Hello,
>
> I have upgraded uima-as to version 2.4.2 but I still encounter an issue
> with the wireFormat.maxFrameSize setting for the ActiveMQ broker.
> 1. I have updated the configuration for transport connector in
> activemq.xml file:
> 
> 
> 
> 2. I have set the brokerURL attribute in uima-as deployment descriptors to
> value: "tcp://
> 127.0.0.1:61616?wireFormat.maxInactivityDuration=0&wireFormat.maxFrameSize=209715200&jms.useCompression=true
> "
> 3. I have set the TRACE level for logger org.apache.activemq.transport
>
> After performing all the above settings I noticed that when I started the
> pipeline, for each remote delegate, multiple negotiations are performed
> by org.apache.activemq.transport.WireFormatNegotiator. All use the
> maxFrameSize of 200 MB that I specified, except one negotiation that is
> done using maxFrameSize of 100 MB.
> I do not understand from where does come this limitation of 100 MB. Does
> exist in the UIMA client? By default I saw that ActiveMQ is using MAX_LONG
> for maxFrameSize so I really don't know from where does come this 100 MB
> setting for maxFrameSize.
>
> Does anyone have an idea why is happening this? Could somebody tell me a
> starting point for looking in the uima code?
>
>
> On the other hand does anybody know whether there are some limitations
> when using the "binary" serializer for remote delegates instead of "xmi"
> serializer? I found in one jira issue (
> https://issues.apache.org/jira/browse/UIMA-1196) that for the "binary"
> serializer is mandatory that all uima AS services use a common type system.
> Is this still an issue in uima-as 2.4.2?
>
> Thank you!
> Mihaela
>
>
>
>
> On Monday, January 27, 2014 4:30 PM, Eddie Epstein 
> wrote:
>
> On Thu, Jan 23, 2014 at 9:28 AM, Thomas Ginter  >wrote:
>
> > It is likely then that your expansion is happening after the remote
> > service is called or else is not yet big enough to be over the 100MB
> limit.
> >
>
> Also note that by default UIMA-AS [Java] services use a delta-CAS
> interface. Only changes to the CAS
> are returned from a service.
>
> Besides deleting unnecessary FS from the final CAS to be returned, another
> option to consider is to use compression on JMS messages:
> jms.useCompression=true
> This decoration can be added to the broker configuration file,
>$UIMA_HOME/amq/conf/activemq-nojournal.xml
>
> as
>
> which will cause messages in all queues to be compressed.
>
> Eddie
>


Re: uima-as 2.3.1 - java.io.IOException: Frame size of 147 MB larger than max allowed 100 MB

2014-02-12 Thread Mihaela M
Hello,

I have upgraded uima-as to version 2.4.2 but I still encounter an issue with 
the wireFormat.maxFrameSize setting for the ActiveMQ broker.
1. I have updated the configuration for transport connector in activemq.xml 
file:

            

2. I have set the brokerURL attribute in uima-as deployment descriptors to 
value: 
"tcp://127.0.0.1:61616?wireFormat.maxInactivityDuration=0&wireFormat.maxFrameSize=209715200&jms.useCompression=true"
3. I have set the TRACE level for logger org.apache.activemq.transport

After performing all the above settings I noticed that when I started the 
pipeline, for each remote delegate, multiple negotiations are performed by 
org.apache.activemq.transport.WireFormatNegotiator. All use the maxFrameSize of 
200 MB that I specified, except one negotiation that is done using maxFrameSize 
of 100 MB.
I do not understand from where does come this limitation of 100 MB. Does exist 
in the UIMA client? By default I saw that ActiveMQ is using MAX_LONG for 
maxFrameSize so I really don't know from where does come this 100 MB setting 
for maxFrameSize.

Does anyone have an idea why is happening this? Could somebody tell me a 
starting point for looking in the uima code?


On the other hand does anybody know whether there are some limitations when 
using the "binary" serializer for remote delegates instead of "xmi" serializer? 
I found in one jira issue (https://issues.apache.org/jira/browse/UIMA-1196) 
that for the "binary" serializer is mandatory that all uima AS services use a 
common type system. Is this still an issue in uima-as 2.4.2?

Thank you!
Mihaela




On Monday, January 27, 2014 4:30 PM, Eddie Epstein  wrote:
 
On Thu, Jan 23, 2014 at 9:28 AM, Thomas Ginter wrote:

> It is likely then that your expansion is happening after the remote
> service is called or else is not yet big enough to be over the 100MB limit.
>

Also note that by default UIMA-AS [Java] services use a delta-CAS
interface. Only changes to the CAS
are returned from a service.

Besides deleting unnecessary FS from the final CAS to be returned, another
option to consider is to use compression on JMS messages:
jms.useCompression=true
This decoration can be added to the broker configuration file,
   $UIMA_HOME/amq/conf/activemq-nojournal.xml

as
   
which will cause messages in all queues to be compressed.

Eddie

Re: Unable to use ConceptMapper annotator

2014-02-12 Thread Richard Eckart de Castilho
On 12.02.2014, at 11:22, Peter Litsegård  wrote:

> Why would the
> ConceptMapper want to use these as the types declared on those xmls have
> already been "Cas generated" and their .class files are present in the CM-jar?

The generated JCas classes are just a way of mapping the UIMA type system to the
Java type system. They offer a convenience for programming using the known
class/getter/setter concepts in Java.

These classes are not a substitute for the XML-based type system definitions.
The type system definitions are always required in addition to the JCas classes.

When you use only the JCas classes, but did not initialize the CAS with the 
proper
types, you'll get such an error message:

"JCas type used in Java code, but was not declared in the XML type descriptor"

Cheers,

-- Richard

Re: Installing pears on DUCC

2014-02-12 Thread Bai Shen
I'll take a look.  Thanks.

I'm still learning UIMA as I inherited the cluster. :)


On Tue, Feb 11, 2014 at 3:46 PM, Eddie Epstein  wrote:

> Depends on what you want to do. If you have UIMA-AS services and you want
> to use DUCC to control their life cycle, see DuccBook Chapter 5, Service
> Management. To scale out collection processing processing, see chapter 8.
>
> If you have any specific needs for running UIMA-based analytics on one or
> more machines and don't see how to use DUCC, please describe here.
>
>
> On Tue, Feb 11, 2014 at 3:28 PM, Bai Shen  wrote:
>
> > Okay, I'll go ahead and redo my setup using UIMA-AS 2.4.2.
> >
> > How do I get DUCC to control my UIMA-AS setup?
> >
> >
> > On Tue, Feb 11, 2014 at 3:22 PM, Eddie Epstein 
> > wrote:
> >
> > > Sorry for this to be confusing. The UIMA-AS package is an SDK and does
> > > include most if not all the utilities in the core UIMA SDK. Please do
> > stick
> > > with UIMA-AS v2.4.2.
> > >
> > > To be honest I don't remember any discussion about UIMA-DUCC also being
> > an
> > > SDK, a super set of UIMA-AS. It will certainly be discussed now.
> > >
> > > DUCC is a cluster controller that builds on UIMA-AS to automatically
> > scale
> > > out UIMA analytics. The first sample application demonstrates scaling
> > out a
> > > corpus processing task based on OpenNLP.
> > >
> > >
> > >
> > > On Tue, Feb 11, 2014 at 2:58 PM, Bai Shen 
> > wrote:
> > >
> > > > How else do I run the runPearInstaller.sh script?  I have a UIMA-AS
> > 2.4.1
> > > > deployment that I'm trying to change to work with DUCC.  Is this a
> > valid
> > > > way forward or should I stick with UIMA-AS 2.4.2?
> > > >
> > > > If using DUCC instead of UIMA-AS is a valid path, how do I install my
> > > > pears?  Previously I installed the pears and then deployed them.
>  Then
> > I
> > > > was able to send a CAS to the queue and have it processed.
> > > >
> > > > I'm still trying to understand how all of the pieces interact and
> what
> > > all
> > > > changes DUCC brings.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > On Tue, Feb 11, 2014 at 2:53 PM, Eddie Epstein 
> > > > wrote:
> > > >
> > > > > You should not need UIMA-AS SDK installed.
> > > > >
> > > > > Eddie
> > > > >
> > > > >
> > > > > On Tue, Feb 11, 2014 at 12:14 PM, Bai Shen <
> baishen.li...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > So I need to install UIMA SDK in addition to DUCC?  What about
> > > UIMA-AS?
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 11, 2014 at 11:21 AM, Eddie Epstein <
> > eaepst...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > The Pear installer is part of the standard UIMA SDK, not
> > currently
> > > > > > included
> > > > > > > in DUCC.
> > > > > > > Definitely something that should be clarified in DUCC.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Eddie
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Feb 11, 2014 at 10:51 AM, Bai Shen <
> > > baishen.li...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I've successfully set up DUCC in single user mode and run the
> > > > example
> > > > > > job
> > > > > > > > through it.
> > > > > > > >
> > > > > > > > Now I'd like to install my pears and attempt to send a CAS
> > > through
> > > > > the
> > > > > > > > system.  The DUCC book mentions the following.
> > > > > > > >
> > > > > > > > "Then install the UIMA pear file in the working directory
> with
> > > the
> > > > > > > > runPearInstaller
> > > > > > > > script and test it with the UIMA Cas Visual Debugger
> > > application."
> > > > > > > >
> > > > > > > > However, I can not find any such script in my DUCC instance.
> > > > >  Googling
> > > > > > > has
> > > > > > > > not proved fruitful.
> > > > > > > >
> > > > > > > > Can anyone point me towards instructions for installing a
> pear
> > on
> > > > > DUCC?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Unable to use ConceptMapper annotator

2014-02-12 Thread Peter Litsegård
Marshall Schor  writes:

> 
> Hi Peter,
> 
> Thanks for pointing this out.  I checked, and did see it there (in
> analysis_engine/primitive/DictTerm.xml)
> 
> So, to make this work, you have to change the spot where this is referenced,
> from trying to reference:
> 
>   "org/apache/uima/conceptMapper/DictTerm.xml" (won't be found in the jar at
> that spot) to
>   "analysis_engine/primitive/DictTerm.xml"  (where it is in the Jar)
> 
> if you want to reference that embedded copy. 
> 
> I'm not sure that's the proper way to do this, though... 
> I hope the documentation (
>
http://uima.apache.org/d/uima-addons-current/ConceptMapper/ConceptMapperAnnotatorUserGuide.html
> ) for this can be of help.
> 
> -Marshall
> 
> On 2/11/2014 1:52 AM, Peter Litsegård wrote:
> > Hi Marshall!
> >
> > Hmmm, the DictTerm.xml file is present and I've tried to put it in a number
> > of places with no avail. I thought that the error might be a typo in the
> > exception handling of a class-loader exception. I know very farfecthed...:)
> >
> > Nevertheless DictTerm.xml exists in the "uima-an-conceptMapper.jar" file
> > under "analysis_engine.primitive" folder. Do you know what I need to do in
> > order for the DictTerm.xml file to be found?
> >
> >
> 
> 

Hi Marshall!

Thanks for trying to help me out here. The more I look into this the
trickier it gets...:(

Just to give you some additional background I've done the following:

1. downloaded the uima-core
2. downloaded the ConceptMapper.jar
3. referenced both of the above from my project

In my code I do the following:
XMLInputSource in = new
XMLInputSource("bin/descriptors/analysis_engine/
ConceptMapperOffsetTokenizer.xml");
ResourceSpecifier specifier =
UIMAFramework.getXMLParser().parseResourceSpecifier(in);
AnalysisEngine ae = UIMAFramework.
produceAnalysisEngine(specifier);

When I try to invoke the 'produceAnalysisEngine(...)' 
method above I get the following error:

"...Caused by: org.apache.uima.util.InvalidXMLException: 
An import could not be resolved.  No file with name 
"org/apache/uima/conceptMapper/DictTerm.xml"
was found in the class path or data path. (Descriptor:
file:/C:/.../ConceptMapperOffsetTokenizer.xml)
at
org.apache.uima.resource.metadata.impl.Import_impl.findAbsoluteUrl
(Import_impl.java:115)
at
org.apache.uima.resource.metadata.impl.TypeSystemDescription_impl
.resolveImports(TypeSystemDescription_impl.java:220)
at
org.apache.uima.resource.metadata.impl.TypeSystemDescription_impl
.resolveImports(TypeSystemDescription_impl.java:202)
at
org.apache.uima.analysis_engine.metadata.impl.
AnalysisEngineMetaData_impl.
resolveImports(AnalysisEngineMetaData_impl.java:87)
at
org.apache.uima.resource.Resource_ImplBase.
initialize(Resource_ImplBase.java:129)"

I simply can't understand why I get these errors as I'm relying on the
defaults for the ConceptMapper and all the type systems are in place in the
jars. I'm far from an expert on UIMA BUT I would have thought this should be
pretty forward but no:( I thought the DictTerm.xml and TokenAnnotation.xml
etc were "simple" type system declarations and nothing else. Why would the
ConceptMapper want to use these as the types declared on those xmls have
already been "Cas generated" and their .class files are present in the CM-jar?

Sorry for the lengthy post but I want to be clear:)