unsubscribe

2016-06-27 Thread Thomas Ginter


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Pulling data from a secured SQL database

2015-10-30 Thread Thomas Ginter
I am working in an environment where data is stored in MS SQL Server.  It has 
been secured so that only a specific set of machines can access the database 
through an integrated security Microsoft JDBC connection.  We also have a 
couple of beefy linux machines we can use to host a Spark cluster but those 
machines do not have access to the databases directly.  How can I pull the data 
from the SQL database on the smaller development machine and then have it 
distribute to the Spark cluster for processing?  Can the driver pull data and 
then distribute execution?

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: UIMAj3 ideas

2015-07-16 Thread Thomas Ginter
Richard,

There is an API in UIMA for generating Analysis Engine Descriptors as well as 
Aggregates and Type System descriptions.  I use that API to generate the xml 
descriptor at runtime after the configuration has been completed.  I wrote my 
own logic to track the delegates of an Aggregate descriptor in order to 
propagate updates to/from delegates to allow the user to dynamically specify 
Analysis Engine parameters.  I also merged the scale out parameters for UIMA-AS 
into the Analysis Engine object for ease of configuration.  

In addition I wrote my own code to generate the deployment descriptor from the 
programmatic parameters provided.  The resulting XML is what the framework uses 
to generate the Spring Bean file you mentioned.

That being said the existing API definitely has a learning curve which was part 
of the motivation for creating Leo.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




 On Jul 16, 2015, at 1:51 PM, Richard Eckart de Castilho r...@apache.org 
 wrote:
 
 Hi Thomas,
 
 On 16.07.2015, at 21:42, Thomas Ginter thomas.gin...@utah.edu wrote:
 
 Have you looked into using Leo?  It allows you to programmatically create 
 Analysis Engines, Aggregates, the type system, and launch everything in 
 UIMA-AS without having to manage any XML descriptors at all.  Furthermore it 
 is available via Maven so your code can compile an run.  
 
 Did you find an API in UIMA AS to handle the programmatic generation of 
 descriptors, or did you implement that yourself in Leo (as I had tried to in 
 DKPro Lab)? 
 
 If I remember correctly, then UIMA AS loaded plain XML descriptor files, 
 transforms them to a Spring Bean file using XSLT and then used Spring to 
 instantiate it. But I may have missed something.
 
 Cheers,
 
 -- Richard



Re: UIMAj3 ideas

2015-07-16 Thread Thomas Ginter
Hi Petr,

Have you looked into using Leo?  It allows you to programmatically create 
Analysis Engines, Aggregates, the type system, and launch everything in UIMA-AS 
without having to manage any XML descriptors at all.  Furthermore it is 
available via Maven so your code can compile an run.  

http://department-of-veterans-affairs.github.io/Leo/userguide.html

The only catch to running UIMA-AS is making sure the broker is running.  A 
manual step that we have not yet automated.  Other than that it can scale most 
pipelines with the notable exception of pipelines that have really large 
resources.

As for ideas for UIMA 3 I would love to see a much simpler CAS system that 
didn’t require a pre-definition of types before execution.  Such as a very 
simple abstract base class that defines an “annotation” and is then extended in 
order to create/use a new type.  It seems like the basic location based indexes 
could still be provided that way as well as the option of extending to provide 
custom indexes.  If the CAS was implemented as a base set of very simple Java 
objects we would also have more serialization options.  Possibly even making it 
possible for the user to plug in a different serializer if required such as 
protobuff.  Just a thought.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




 On Jul 16, 2015, at 10:25 AM, Petr Baudis pa...@ucw.cz wrote:
 
  Hi!
 
 On Fri, Jul 10, 2015 at 10:28:08AM -0400, Eddie Epstein wrote:
 Good comments which will likely generate lots of responses.
 For now please see comments on scaleout below.
 
 On Thu, Jul 9, 2015 at 6:52 PM, Petr Baudis pa...@ucw.cz wrote:
 
  * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
UIMA.  It seems to me that UIMA-AS is doing things a bit differently
than what the original UIMA idea of doing scaleout was.  The two
things don't play well together.  I'd love a way to easily take
my plain UIMA pipeline and scale it out, ideally without any code
changes, *and* avoid the terrible XML config files.
 
 
 Not clear what you are referring to as the original UIMA idea of doing
 scaleout,
 the CPE? Core UIMA is a single threaded, embeddable framework. UIMA-AS
 is also an embeddable framework that offers flexible vertical
 (multi-threading) and
 horizontal (multi-process) options for deploying an arbitrary pipeline.
 Admittedly
 scaleout with UIMA-AS is complicated and the minimal support for process
 management make it difficult to do scaleout simply. In what ways do you
 think
 UIMA-AS is inconsistent with UIMA or UIMA scaleout?
 
  Well, my impression after delving into some UIMA internals was that
 the original idea was to use the Analysis Structure Broker to control
 the pipeline flow and it would seem natural that when doing scale-out,
 one would simply provide a different ASB.  Its javadoc even reads
 
 The Analysis Structure Broker (codeASB/code) is the component
 responsible for the details of communicating with Analysis Engines
 that may potentially be distributed across different physical
 machines.
 
 Of course, maybe I got it wrong.
 
 DUCC is full cluster management application that will scaleout a plain UIMA
 pipeline with no code changes, assuming that the application code is
 threadsafe.
 But a typical pipeline with a single collection reader creating input CASes
 and
 a single cas consumer will limit scaleout performance pretty quickly. DUCC
 makes it easyto eliminate the input data bottleneck. DUCC sample apps
 show one approach to eliminating the output bottleneck. Have you looked at
 DUCC?
 
  I use UIMA pipeline for question answering, where each question
 currently takes ~30s (single-threaded) to process (a lot of it spent
 waiting on databases), so I don't think I'd hit such a bottleneck.
 I did spend a few tens of minutes looking at DUCC, but I got the
 impression that it's not really trivial to set up.
 
  One of my goals is to minimize setup hassles for anyone who wants to
 run my software - ideally, they should be able to just compile and run.
 If I started to use DUCC, I'm not sure to what degree I could preserve
 this, but at least it's another element in the already steep learning
 curve for anyone who wants to tinker with the system.
 
  (Then there's this whole issue of UIMA-AS vs. UIMAfit and in-memory
 resource sharing - though from one of your previous emails, I got the
 impression that I could run multiple AEs in threads of a single java
 process; but I guess at that point I was already decided that I want
 to try something less complex.)
 
 -- 
   Petr Baudis
   If you have good ideas, good data and fast computers,
   you can do almost anything. -- Geoffrey Hinton



Re: Generics in 2.8.0 getAllIndexedFS

2015-07-08 Thread Thomas Ginter
So long as the Runtime error is meaningful and documented then I vote for 
option 3.  T extends TOP still limits the user to the family of the UIMA 
universe so to speak without limiting them to an explicit FS inheritance which 
is a useful flexibility in spite of the risk of a casting error.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




 On Jul 8, 2015, at 09:14, Marshall Schor m...@schor.com wrote:
 
 I agree that (3) is not safe.  However it imposes a burden on the user
 (assuming they want to use some method that's in the type but not in TOP) to
 cast the result to the type.  This cast could also throw a runtime error, of 
 course.
 
 So, what I'm thinking is that there's no particular value in not allowing 3) -
 the user could cause a runtime error in either case;
 but not doing 3) would make UIMA get in the way of coders trying to get 
 their
 work done :-) - for the case where they were doing proper type casting.
 
 On balance, it seems to me to be better to allow 3.
 
 re: using the older forms: yes, that's really not needed (except perhaps for
 edge cases), so could be deprecated.  At this point, I'm not sure that's worth
 doing, though...
 
 Here's one edge case (these are hard to think of :-) ).  The coder has a type
 hierarchy A - B .  They define JCas class for A, but not for B.
 To get all instances of B, they would need the older format.
 
 -Marshall
 
 On 7/8/2015 9:44 AM, Richard Eckart de Castilho wrote:
 If type inferencing from the surrounding context wasn't done, and the user 
 needed to cast the result, the user would be exposed to the same runtime 
 error.  So, unless there's some other pros/cons, it seems to me it would be 
 best to allow generic type inferencing in cases where there's a type 
 specified (by any means) in the getAllIndexedFS method call.
 I'd not say by any means.
 
 using JCas APIs:
 1) FSIteratorTOP getAllIndexedFS(aType);
 2) T extends TOP FSIteratorT getAllIndexedFS(ClassT clazz)
 3) T extends TOP FSIteratorT getAllIndexedFS(aType)
 
 I'd consider 1 and 2 to be safe and ok:
 - 1 is guaranteed to return TOP or a subtype of it.
 - 2 is quaranteed to return clazz or a subtype of it.
 
 3 is not save:
 
 FSIteratorToken i = getAllIndexedFS(Sentence.type)
 
 This causes a runtime error.
 
 Question: except for history reasons, why do we need the aType
 signature in a JCas context at all? Couldn't it be deprecated
 in favor of the type-safe clazz variant?
 
 -- Richard
 
 On 08.07.2015, at 15:24, Marshall Schor m...@schor.com wrote:
 
 More about the signatures and type inference.
 
 We have the following cases:
 
 (maybe) not JCas, using CAS APIs: 
 (maybe because a JCas user might get a CAS - not a JCas - in some 
 routine)
 
   (no arguments in getAllIndexedFS)
 FSIterator... getAllIndexedFS();
 
   (type argument in getAllIndexedFS) 
 FSIterator... getAllIndexedFS(aType);
 
 using JCas APIs:
   (no arguments in getAllIndexedFS)
 FSIterator... getAllIndexedFS();
 
   (type argument in getAllIndexedFS) 
 FSIterator... getAllIndexedFS(aType);
 FSIterator... getAllIndexedFS(ClassFoo clazz)
 
 For the getAllIndexedFS() (no argument) kinds of calls, I think there's 
 agreement to use the generic FeatureStructure for the CAS APIs, and TOP for 
 the JCas APIs.
 
 When the getAllIndexedFS is given type arguments, the method returns an 
 iterator over that type and its subtypes.  Here it seems best to use the 
 JCas type corresponding to the type argument.  This is easy to do in the 
 last case, above.  It can be allowed if the other calls use generic 
 method forms and pick up the type from the surrounding context.
 
 The pro for doing this is that it makes UIMA more coder-friendly, by not 
 requiring the coder to cast the result.
 The con for doing this is that it allows the coder to make a mistake 
 (specifying the wrong type).  This would only be caught at run time.
 
 



Re: UIMAFit and UIMA-AS deployment

2015-05-14 Thread Thomas Ginter
There is also Leo which allows you to programmatically create pipelines, launch 
them as UIMA-AS services, and manage types systems and clients without having 
to touch any descriptor files.  You can find documentation at the site below:

http://department-of-veterans-affairs.github.io/Leo/userguide.html

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




 On Apr 30, 2015, at 11:33, Richard Eckart de Castilho r...@apache.org wrote:
 
 Hi,
 
 I have tried once to use UIMA-AS and I did it in conjunction with uimaFIT.
 
 At the time, I didn't find any API to programmatically create UIMA-AS
 deployment descriptors. It appeared to me as if UIMA-AS extracted
 information directly from the XML stream - but then I maybe didn't dig
 deep enough.
 
 Anyway, I created a class to programmatically build a subset of the 
 UIMA-AS deployment descriptor.
 
 You find this class and some code using it here:
 
 https://code.google.com/p/dkpro-lab/source/browse/dkpro-lab-uima-engine-uimaas/src/main/java/de/tudarmstadt/ukp/dkpro/lab/uima/engine/uimaas/AsDeploymentDescription.java
 
 Maybe it helps you.
 
 Cheers,
 
 -- Richard
 
 On 30.04.2015, at 17:40, Sylvain Surcin sur...@kwaga.com wrote:
 
 Hello,
 
 I'm trying to see if I can adapt our UIMA-AS architecture to UIMAFit.
 
 And I'm wondering how to actually do it from the main level where I have a
 class
 
 UimaAsynchronousEngine myEngine = new BaseUIMAAsynchronousEngine_impl();
 myEngine.addStatusCallbackListener(myListener);
 myEngine.deploy(myAsDeploymentDescriptorFile, applicationContext);
 
 The AS deployment descriptor file has a section
 topDescriptor
 import location=./MyAggregateChain.xml/
 /topDescriptor
 
 Now, if I want to be smart and use UIMAFit's AggregateBuilder, how do I
 reconciliate that with the deployment descriptor file?
 
 Is there a way to do that entirely from within the Java code?
 Or do I have to use UIMAFit to generate the aggregate descriptor file from
 the AnalysisEngine built by the AggregateBuilder?
 
 Thanks for your help,
 
 [+] Add me to your address book
 https://ws.writethat.name/kwaga-bin/titan/WEB/me.pl/5075409511380703595/i
 
 Sylvain SURCIN, Ph.D.
 *KWAGA*
 Senior Software Architect
 15, rue Jean-Baptiste Berlier
 75013 Paris
 France
 Tél.: +33 (0)1.55.43.79.20
 



Re: Read file name in an annotator

2014-07-09 Thread Thomas Ginter
Hi Debbie,

The file name is not provided by default in UIMA although I believe the UIMA 
FileReader does populate a SourceDocumentInformation annotation with this 
information.  Our group has a set of readers that populate our own annotation 
type to provide location data and other meta-information for each record (CAS) 
being processed.  In short you will be better off writing your reader to 
provide that information for you.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Jul 9, 2014, at 5:41, Debbie Zhang debbie.d.zh...@gmail.com wrote:

 Hi,
 
 Can anyone tell me how to read the file name in an annotator using the
 JCas? It seems the DocumentAnnotation does't contain file name. Thank
 you!
 
 Best regards,
 
 Debbie Zhang



Re: FilteredIterator is very slow

2014-03-31 Thread Thomas Ginter
Larry,

A faster way to get the list of types that you will skip would be to do the 
following:

FSIndexTitlePersonHonorificAnnotation titlePersonHAIndex = 
aJCas.getAnnotationIndex(TitlePersonHonorificAnnotation.type);

Doing this for each type will yield an index that points to just the 
annotations in the CAS of each type you are interested in.  From there you can 
get an iterator reference ( titlePersonHAIndex.iterator() ) and either traverse 
each one separately or else add them to a common Collection such as an 
ArrayList and iterate through that.  You could also take advantage of the fact 
that the default index in UIMA sorts on ascending order on the begin index and 
descending order on the ending index to stop once you have traversed the list 
past the ending index of the dictTerm.  

An important design decision though would be to consider whether the dictTerm 
annotations are much more numerous than the TitlePersonHonorificAnnotation, 
MeasurementAnnotation, and ProgFactorTerm filtering annotation types.  
Generally if the filter types are much more plentiful and the dictTerm type was 
more rare then looking for overlapping filter types will yield fewer iterations 
of your algorithm, however if there are a lot of dictTerm occurrences and only 
a few of the filter types then it may be more efficient to iterate through the 
filter types and eliminate dictTerms that overlap or are covered.  

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Mar 31, 2014, at 11:47 AM, Kline, Larry larry.kl...@mckesson.com wrote:

 When I use a filtered FSIterator it's an order of magnitude slower than a 
 non-filtered iterator.  Here's my code:
 
 Create the iterator:
   private FSIteratorAnnotation createConstrainedIterator(JCas aJCas) 
 throws CASException {
  FSIteratorAnnotation it = 
 aJCas.getAnnotationIndex().iterator();
  FSTypeConstraint constraint = 
 aJCas.getConstraintFactory().createTypeConstraint();
  constraint.add((new 
 TitlePersonHonorificAnnotation(aJCas)).getType());
  constraint.add((new MeasurementAnnotation(aJCas)).getType());
  constraint.add((new ProgFactorTerm(aJCas)).getType());
  it = aJCas.createFilteredIterator(it, constraint);
  return it;
   }
 Use the iterator:
   public void process(JCas aJCas) throws AnalysisEngineProcessException {
  ...
 // The following is done in a loop
   if (shouldSkip(dictTerm, skipIter))
  continue;
  ...
   }
 Here's the method called:
   private boolean shouldSkip(G2DictTerm dictTerm, FSIteratorAnnotation 
 skipIter) throws CASException {
  boolean shouldSkip = false;
  skipIter.moveToFirst();
  while (skipIter.hasNext()) {
 Annotation annotation = skipIter.next();
 if (UIMAUtils.annotationsOverlap(dictTerm, annotation)) {
   shouldSkip = true;
   break;
 }
  }
  return shouldSkip;
   }
 
 If I change the method, createConstrainedIterator(), to this (that is, no 
 constraints):
   private FSIteratorAnnotation createConstrainedIterator(JCas aJCas) 
 throws CASException {
  FSIteratorAnnotation it = 
 aJCas.getAnnotationIndex().iterator();
  return it;
   }
 
 It runs literally 10 times faster.  Doing some profiling I see that all of 
 the time is spent in the skipIter.moveToFirst() call.  I also tried creating 
 the filtered iterator each time anew in the shouldSkip() method instead of 
 passing it in, but that has even slightly worse performance.
 
 Given this performance I suppose I should probably use a non-filtered 
 iterator and just check for the types I'm interested in inside the loop.
 
 Any other suggestions welcome.
 
 Thanks,
 Larry Kline
 
 



Re: uima jcas get annotation type from string

2014-02-14 Thread Thomas Ginter
Once you have the Type object you can get all and index to all the annotations 
in the case using:

AnnotationIndexAnnotation mySentenceIndex = 
jcas.getAnnotationIndex(mySentenceTypeObj);

Then you can get an iterator over the index using:

FSIteratorAnnotation mySentenceIterator = mySentenceIndex.iterator();

or you could just use the iterator loop syntax in Java such as:

for(Annotation sentence : mySentenceIndex) {
/** Do something cool **/
}

The AnnotationLibrarian class in the Leo framework provides some pretty 
convenient methods for this as well such as:

CollectionSentence sentenceList = 
AnnotationLibrarian.getAllAnnotationsOfType(jcas, mySentenceTypeObj);

which returns a list of Sentence annotation types.  You can find more 
information about the Leo framework at the following URL:

http://decipher.chpc.utah.edu/sites/gov.va.vinci/leo/2014.01.8/

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Feb 14, 2014, at 2:50 AM, Richard Eckart de Castilho r...@apache.org wrote:

 On 14.02.2014, at 09:50, hannes schantl johannes.scha...@gmail.com wrote:
 
 thanks for the answers.
 
 Is there also a way to get a Type from a String, which can be used as
 argument for the JCasUtil.select method?
 
 The JCasUtil methods assume that you have access to JCas classes, e.g.
 
  import mypackage.AnnotationType;
  JCasUtil.select(jcas, AnnotationType.class)
 
 If you want to select based on names/types, not on JCas-classes, you could
 consider using the CasUtil methods:
 
  CAS cas = jcas.getCas(); // Or use inherit from CasAnnotator_ImplBase
  Type annotationType = CasUtil.getType(cas, mypackage.AnnotationType);
  CasUtil.select(cas, annotationType);
 
 Of course, you could also use reflection to get the class for your annotation
 type and pass it to JCasUtil - but that would be redundant and would require
 handling various exceptions:
 
  JCasUtil.select(jcas, Class.forName(mypackage.AnnotationType))
 
 I want to use the type object to get all Annotations of type Sentence from
 the Cas. And further extract all Annotations within this
 sentence. There for sure other ways to solve this issue without using
 JCasUtil, but it seems JCasUtil provide an easy way to do this by using the
 methods
 JCasUtil.select and JCasUtil.selectCovered.
 
 CasUtil largely mirrors the functionality of JCasUtil. In fact, JCasUtil calls
 out to CasUtil for most of the grunt work.
 
 Cheers,
 
 -- Richard
 
 greetings Hannes
 
 
 Am 13.02.2014 22:11, schrieb Thomas Ginter:
 
 There are a couple of different ways to get a pointer to specific Type 
 object.
 
 jcas.getRequiredType(mypackage.AnnotationType);
 (cas|jcas).getTypeSystem.getType(mypackage.AnnotationType);
 
 The question is what do you want to do with the Type object once you have it.
 
 Thanks,
 
 Thomas ginter801-448-7676thomas.gin...@utah.edu
 
 On Feb 13, 2014, at 6:03 AM, hannes schantl
 johannes.scha...@gmail.com johannes.scha...@gmail.com wrote:
 
 
 Hi,
 
 Is there a way to get an annotation Type from the cas(or Jcas) from a
 string.
 For example, i am looking for something like that:
 jcas.getCasType(AnnotationName)
 
 greetings Hannes
 



Re: uima jcas get annotation type from string

2014-02-13 Thread Thomas Ginter
There are a couple of different ways to get a pointer to specific Type object.  

jcas.getRequiredType(“mypackage.AnnotationType”);
(cas|jcas).getTypeSystem.getType(“mypackage.AnnotationType”);

The question is what do you want to do with the Type object once you have it.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Feb 13, 2014, at 6:03 AM, hannes schantl johannes.scha...@gmail.com wrote:

 Hi,
 
 Is there a way to get an annotation Type from the cas(or Jcas) from a
 string.
 For example, i am looking for something like that:
 jcas.getCasType(AnnotationName)
 
 greetings Hannes



Re: uima-as 2.3.1 - java.io.IOException: Frame size of 147 MB larger than max allowed 100 MB

2014-01-23 Thread Thomas Ginter
1.  Your annotators can remove as well as add annotations.  Perhaps if there is 
a large number of annotations that you don’t really need you could have a clean 
up annotator that removes the extra stuff, or else just don’t generate it in 
the first place, whatever works best for your algorithm.
2.  Remote services in your pipeline are serialized the same way as the 
serialization with the client.  In fact the framework essentially creates a 
client interface for sending and receiving CAS objects and then passing them 
to/from your pipeline.  It is likely then that your expansion is happening 
after the remote service is called or else is not yet big enough to be over the 
100MB limit.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Jan 23, 2014, at 12:53 AM, Mihaela M mmihaela1...@yahoo.com wrote:

 1. I will upgrade uima-as and review the annotations gathered in the CAS, but 
 is it a way to have the CAS reset before sending it to the client? In my case 
 I only want to get the status of the processing, not all the annotations 
 found, because they were handled by the consumers configured in the pipeline 
 anyway.
 
 2. Do you know whether the aggregates communicate with the clients the same 
 as with the remote CAS consumers? I wonder why it did not complain while 
 sending the exploded CAS to the remote consumer, but it did when 
 communicating with the client.
 
 Thank you!
 Mihaela
 
 
 
 On Wednesday, January 22, 2014 7:07 PM, Thomas Ginter 
 thomas.gin...@utah.edu wrote:
 
 Mihaela,
 
 There are two things that you should probably do in order to get started with 
 these issues.
 
 1.  Upgrade to UIMA-AS 2.4.2 which uses a newer version of ActiveMQ and 
 contains numerous bug fixes for UIMA-AS related to how the JMS queues are 
 handled.
 2.  The UIMA-AS framework adds very little as far as overhead space for the 
 CAS objects which means the vast majority of the size expansion from 48KB to 
 147MB is coming from annotations/metadata being added by your service.  
 Increasing the frame size in ActiveMQ may allow your CAS objects to be 
 transferred in JMS but it is more important to find out what is causing this 
 dramatic expansion and whether or not the service can be written differently 
 so that the expansion is much smaller.
 
 Thanks,
 
 Thomas Ginter
 801-448-7676
 thomas.gin...@utah.edu
 
 
 
 
 
 On Jan 22, 2014, at 9:44 AM, Mihaela M mmihaela1...@yahoo.com wrote:
 
 Hello,
 
 I have a uima pipeline that uses uima-as 2.3.1 which has one aggregator with 
 one local annotator, one remote consumer and one remote annotator. It 
 actually has more components but I will get into exactly the configuration 
 only if needed.
 I have developed also a UIMA client for it using class: 
 UimaAsynchronousEngine, method sendCas (async as far I understood) and a 
 callback listener that waits for the processing to complete.
 
 1. I have noticed that the CAS returned, in general is quite big. Is it a 
 way to send, at least to the client, a CAS that does not contain all the 
 types that the various annotators added? When could I remove those things 
 from the CAS?
 2. I send a text message for processing which has 48 KB - it gets processed 
 successfully by the pipeline, but the pipeline fails to send a reply to the 
 client. The exception that I get is:
 
 01/21/2014 07:36:02.978 [ActiveMQ Transport:
 tcp://localhost/127.0.0.1:61616] [DEBUG] 
 org.apache.activemq.ActiveMQConnection
 - Async exception with no exception listener: java.io.IOException: Frame size
 of 147 MB larger than max allowed 100 MB
 java.io.IOException: Frame size of 147 MB larger than max
 allowed 100 MB
  at
 org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:277)
 ~[activemq-core-5.6.0.jar:5.6.0]
  at
 org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:229)
 ~[activemq-core-5.6.0.jar:5.6.0]
  at
 org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:221)
 ~[activemq-core-5.6.0.jar:5.6.0]
  at
 org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:204)
 ~[activemq-core-5.6.0.jar:5.6.0]
  at
 java.lang.Thread.run(Thread.java:662) [na:1.6.0_30]
 01/21/2014 07:36:03.093 [ActiveMQ Connection Executor:
 tcp://localhost/127.0.0.1:61616] [DEBUG]
 org.apache.activemq.transport.tcp.TcpTransport - Stopping transport
 tcp://localhost/127.0.0.1:61616
 
 As far as I understood, the client connects via JMS to the uima pipeline and 
 a temporary reply queue gets created where the reply from the pipeline 
 should be sent and then consumed by the client. After the above exception is 
 thrown, the connection to the pipeline gets closed and automatically the 
 temp queue gets deleted hence the client does not receive anymore the reply.
 
 I am wondering why the error I was mentioning is not thrown while the 
 aggregator sends the CAS to the consumer, because the consumer

Re: how to dynamically set a required annotation type from within a UIMAfit annotator?

2013-12-05 Thread Thomas Ginter
Renaud,

We (clinical NLP group at the University of Utah) have written a platform that 
sits on top of UIMA-AS that will allow you to dynamically assign and even 
generate types for annotation engines.  We have a whole family of annotators 
whose parameters are dynamic using this platform.  We are almost ready to 
release this as open source, though it is still probably another month or two 
out.  Until that time we are open to collaboration opportunities to wherein we 
give you access to the software and teach you how it is used.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Dec 5, 2013, at 3:43 AM, Richard Eckart de Castilho r...@apache.org wrote:

 To my knowledge, the capabilities are part of the descriptor which must be 
 available before the AE is initialized. You cannot retroactively change the
 descriptor of a method from within its initialize() method.
 
 It would be nice to have something like this, though. But that would also mean
 switching any flow controllers which use this information from a static 
 planning
 to a dynamic planning approach.
 
 How about filing a feature request against the UIMA framework?
 
 -- Richard
 
 On 05.12.2013, at 08:35, Renaud Richardet renaud.richar...@gmail.com wrote:
 
 I find it very convenient to add
 
 @TypeCapability(inputs = { TOKEN, SENTENCE, COOCCURRENCE })
 so that I can ensure that dependencies are met. But sometimes, the
 dependencies are dynamic (e.g. an input type capability is part of the
 config of an annotator, and is loaded dynamically, see code below).
 
 Is there a way to dynamically set a required annotation type from within a
 UIMAfit annotator? Something like:
 
   @Override
 
   public void initialize(UimaContext context)
 
   throws ResourceInitializationException {
 
   super.initialize(context);
 
   try {
 
   // loading annotation class dynamically
 
   requiredAnnotation= (Class? extends Annotation) Class.forName(
 org.uima.MyRequiredAnnotation);
 
   // adding it as TypeCapability's input
 
   context.getMetadata().addCapabilityInput(requiredAnnotation);
 
   } catch (Exception e) {
 
   throw new ResourceInitializationException(e);
 
   }
 
   }
 
 
 Thanks, Renaud
 



Re: Working with very large text documents

2013-10-18 Thread Thomas Ginter
Armin,

It would probably be more efficient to have a CollectionReader that splits the 
log file so your not passing a gigantic file in RAM from the reader to the 
annotators before splitting it.  If it were me I would split the log file by 
days or hours with a max size that auto segments lines.  If your using UIMA-AS 
you can further scale your processing pipeline to increase throughput way 
beyond what CPE can provide.  Also with UIMA-AS it is easy to create a listener 
that gathers the aggregate processed data from the segments that are returned.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Oct 18, 2013, at 7:58 AM, armin.weg...@bka.bund.de wrote:

 Dear Jens, dear Richard,
 
 Looks like I have to use a log file specific pipeline. The problem was that I 
 did not knew it before the process crashed. It would be so nice having a 
 general approach.
 
 Thanks,
 Armin
 
 -Ursprüngliche Nachricht-
 Von: Richard Eckart de Castilho [mailto:r...@apache.org] 
 Gesendet: Freitag, 18. Oktober 2013 12:32
 An: user@uima.apache.org
 Betreff: Re: Working with very large text documents
 
 Hi Armin,
 
 that's a good point. It's also an issue with UIMA then, because the begin/end 
 offsets are likewise int values.
 
 If it is a log file, couldn't you split it into sections of e.g.
 one CAS per day and analyze each one. If there are long-distance relations 
 that span days, you could add a second pass which reads in all analyzed cases 
 for a rolling window of e.g. 7 days and tries to find the long distance 
 relations in that window.
 
 -- Richard
 
 On 18.10.2013, at 10:48, armin.weg...@bka.bund.de wrote:
 
 Hi Richard,
 
 As far as I know, Java strings can not be longer than 2 GB on 64bit VMs.
 
 Armin
 
 -Ursprüngliche Nachricht-
 Von: Richard Eckart de Castilho [mailto:r...@apache.org]
 Gesendet: Freitag, 18. Oktober 2013 10:43
 An: user@uima.apache.org
 Betreff: Re: Working with very large text documents
 
 On 18.10.2013, at 10:06, armin.weg...@bka.bund.de wrote:
 
 Hi,
 
 What are you doing with very large text documents in an UIMA Pipeline, for 
 example 9 GB in size.
 
 In that order of magnitude, I'd probably try to get a computer with 
 more memory ;)
 
 A. I expect that you split the large file before putting it into the 
 pipeline. Or do you use a multiplier in the pipeline to split it? Anyway, 
 where do you split the input file? You can not just split it anywhere. 
 There is a not so slight possibility to break the content. Is there a 
 preferred chunk size for UIMA?
 
 The chunk size would likely not depend on UIMA, but rather on the machine 
 you are using. If you cannot split the data in defined locations, maybe you 
 can use a windowing approach where two splits have a certain overlap?
 
 B. Another possibility might be not to save the data in the CAS at all and 
 use an URI reference instead. It's up to the analysis engine then how to 
 load the data. My first idea was to use java.util.Scanner for regular 
 expressions for examples. But I think that you need to have the whole text 
 loaded to iterator over annotations. Or is just 
 AnnotationFS.getCoveredText() not working. Any suggestions here?
 
 No idea unfortunately, never used the stream so far.
 
 -- Richard
 
 
 



Re: HashMap as type feature

2013-10-17 Thread Thomas Ginter
Armin,

Yes.  Extracting the key set results in an array wherein the n-th element of 
the key array corresponds to the n-th element of the values array.  That is 
part of how the hash map is handled in Java.  Even if you implemented your own 
sorting algorithm for insertion the value would get inserted with the key and 
the corresponding key and values arrays would still match.  The only caveat 
would be if you decided to manipulate the keys array independently after 
getting it from the HashMap.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Oct 17, 2013, at 8:43 AM, armin.weg...@bka.bund.de wrote:

 Hi Thomas,
 
 thanks for your answer. Using HashMap, does the n-th element of keySet() 
 always corresponds to the n-th element of values()? Is this a defined 
 behavior in Java?
 
 Cheers,
 Armin
 
 -Ursprüngliche Nachricht-
 Von: Thomas Ginter [mailto:thomas.gin...@utah.edu] 
 Gesendet: Mittwoch, 16. Oktober 2013 18:53
 An: user@uima.apache.org
 Betreff: Re: HashMap as type feature
 
 Armin,
 
 Our team does this with an annotation type designed to store feature vectors 
 for Machine Learning applications.  In this case we use a StringArray feature 
 for the keys and a StringArray feature for the values.  The StringArrays are 
 pulled from a HashMapString, String vector variable and inserted into the 
 features with the following code:
 
 int size = vector.size();
 StringArray keys = new StringArray(jcas, size); StringArray values = new 
 StringArray(jcas, size); keys.copyFromArray(vector.keySet().toArray(new 
 String[size]), 0, 0, size); values.copyFromArray(vector.values().toArray(new 
 String[size]), 0, 0, size);
 
 Retrieving the values is fairly straightforward.  If you are using a static 
 annotation type it can be as simple as:
 
 StringArray keys = vector.getKeysArray();
 
 If you parameterize our annotation type in the annotator you can use the name 
 of the feature to get a Feature object reference then pull the StringArrays 
 like so:
 
 Type annotationTypeObj = aJCas.getRequiredType(com.my.Annotation); 
 //parameter is the canonized name of the Annotation type Feature keyFeature = 
 annotationTypeObj.getFeatureByBaseName(keyFeatureName); //the actual name 
 of the feature storing the key StringArray reference Feature valuesFeature = 
 annotationTypeObj.getFeatureByBaseName(valuesFeatureName); //the name of 
 the values feature
 
 //Get a list of the annotation objects in the CAS then iterate through the 
 list, for each annotation 'a' do the following to retrieve the keys and values
 
 StringArray keys = (StringArray) vector.getFeatureValue(keysFeature);
 StringArray values = (StringArray) vector.getFeatureValue(valuesFeature);
 
 If necessary you can retrieve a String[] from the StringArray 
 FeatureStructure by calling the .toArray() method such as:
 
 String[] keysArray = keys.toArray();
 
 Let me know if you have any questions.
 
 Thanks,
 
 Thomas Ginter
 801-448-7676
 thomas.gin...@utah.edumailto:thomas.gin...@utah.edu
 
 
 
 
 On Oct 16, 2013, at 9:55 AM, Dr. Armin Wegner 
 arminweg...@googlemail.commailto:arminweg...@googlemail.com wrote:
 
 Hi,
 
 I'd like to have a type feature that is a list of key-value pairs. The number 
 of pairs is unknown. What's best for this? Is it even possible?
 
 Thanks,
 Armin
 



Re: HashMap as type feature

2013-10-16 Thread Thomas Ginter
Armin,

Our team does this with an annotation type designed to store feature vectors 
for Machine Learning applications.  In this case we use a StringArray feature 
for the keys and a StringArray feature for the values.  The StringArrays are 
pulled from a HashMapString, String vector variable and inserted into the 
features with the following code:

int size = vector.size();
StringArray keys = new StringArray(jcas, size);
StringArray values = new StringArray(jcas, size);
keys.copyFromArray(vector.keySet().toArray(new String[size]), 0, 0, size);
values.copyFromArray(vector.values().toArray(new String[size]), 0, 0, size);

Retrieving the values is fairly straightforward.  If you are using a static 
annotation type it can be as simple as:

StringArray keys = vector.getKeysArray();

If you parameterize our annotation type in the annotator you can use the name 
of the feature to get a Feature object reference then pull the StringArrays 
like so:

Type annotationTypeObj = aJCas.getRequiredType(com.my.Annotation); 
//parameter is the canonized name of the Annotation type
Feature keyFeature = annotationTypeObj.getFeatureByBaseName(keyFeatureName); 
//the actual name of the feature storing the key StringArray reference
Feature valuesFeature = 
annotationTypeObj.getFeatureByBaseName(valuesFeatureName); //the name of the 
values feature

//Get a list of the annotation objects in the CAS then iterate through the 
list, for each annotation 'a' do the following to retrieve the keys and values

StringArray keys = (StringArray) vector.getFeatureValue(keysFeature);
StringArray values = (StringArray) vector.getFeatureValue(valuesFeature);

If necessary you can retrieve a String[] from the StringArray FeatureStructure 
by calling the .toArray() method such as:

String[] keysArray = keys.toArray();

Let me know if you have any questions.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edumailto:thomas.gin...@utah.edu




On Oct 16, 2013, at 9:55 AM, Dr. Armin Wegner 
arminweg...@googlemail.commailto:arminweg...@googlemail.com wrote:

Hi,

I'd like to have a type feature that is a list of key-value pairs. The
number of pairs is unknown. What's best for this? Is it even possible?

Thanks,
Armin



Re: SimpleServer, instantiating CAS with custom typesystem?

2013-02-19 Thread Thomas Ginter
Helen,

You might also consider using UIMA-AS instead.  UIMA-AS allows you to deploy a 
service (your AAE) that can be remotely accessed by UIMA-AS clients on other 
machines or in other JVMs for scalable deployments.  Each client provides a 
CollectionReader to supply documents to the service and a Listener to catch 
return events from the service to know when processing is complete.  You can 
find some additional getting started information about UIMA-AS at the following:

http://uima.apache.org/doc-uimaas-what.html
  

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Feb 19, 2013, at 7:04 AM, Helen Johnson -X (heljohns - Infobahn Softworld 
Inc at Cisco) heljo...@cisco.com wrote:

 Thanks for your reply, Jens.
 
 I admit I had been avoiding setting the text of the CAS to be the entire XML 
 string I get back from the first REST service because it is a massive string 
 and I only want a couple nodes from that xml string to be processed 
 throughout the UIMA pipeline. But I see your point.
 
 
 So then, in this new AE,  I retrieve the entire XML string from the CAS, do 
 the zone-information processing from the specific nodes of the XML. I assume 
 it is straightforward to then reset the CAS text to be just this text I have 
 found in the original XML.  Specifically, I would use CAS.reset() to empty 
 the CAS of the original (full XML) text, then jCAS.setDocumentText() with the 
 new string of just the relevant text, as well as load all the doc-zone 
 annotations at this point. Is this right?
 
 Cheers,
 Helen
 
 -Original Message-
 From: Jens Grivolla [mailto:j+...@grivolla.net] 
 Sent: Tuesday, February 19, 2013 3:20 AM
 To: user@uima.apache.org
 Subject: Re: SimpleServer,  instantiating CAS with custom typesystem?
 
 Hi, SimpleServer itself is in a way your CR, creating a CAS with the document 
 text you sent. Why do you want to change SimpleServer, it seems that you only 
 want to add annotations to the CAS, not fundamentally change how the CAS is 
 created.
 
 It seems to me that it would be far easier to just create an AE that adds 
 those annotations. Then you won't have any typesystem issues either, since 
 the AE would have the appropriate typesystem.
 
 HTH,
 Jens
 
 On 02/18/2013 10:37 PM, Helen Johnson -X (heljohns - Infobahn Softworld Inc 
 at Cisco) wrote:
 I'm stumped:
 
 I have a UIMA pipeline that starts with a CollectionReader that
 
 -  reads XML input (response from a REST service),
 
 -  identifies a couple of relevant XML nodes
 
 -  makes document-level annotations from the relevant nodes (title, 
 document body, footnote section)
 From there, the AnalysisEngine portion of the pipeline has many AEs that 
 I've wrapped into a single AggregateAnalysisEngine.
 The CollectionReader and the AAE all work correctly in this pipeline.
 
 Now I need to transfer this pipeline into a SimpleServer REST service 
 environment.
 I've created a PEAR of the AAE portion of the pipeline, but I can't include 
 the CollectionReader in this PEAR.
 First question:
 It is my understanding the CR cannot be included in the PEAR for the 
 simpleServer, am I correct in this?
 
 In order to get those document-zoning annotations of title, body  footnote, 
 I have added some methods to the Service.java class in the SimpleServer 
 package that do the XML parsing and then do the adding of these annotations 
 to the JCAS before the AAE is called. The error that is being thrown at this 
 point is this:
 
 The server encountered an internal error (JCas type 
 myPackage.DocClass.ArticleMainTitle used in Java code, but was not 
 declared in the XML type descriptor.) that prevented it from fulfilling this 
 request.
 
 Second question:
 Where is Service.java looking for the typesystem xml file to be? I have 
 tried all of the following, with the same error result:
 
 -  put the typesystem descriptor file, myTSD.xml, in SimpleServer/lib
 
 -  create a jar containing myTSD.xml, put it into SimpleServer/lib 
 and add that to the build path
 
 -  (after the two above attempts), in SimpleServer project 
 properties, add lib to the UIMA CDE Property Page
 
 -  in SimpleServer project properties, in UIMA Type System, point to 
 the myTSD.xml file in lib
 
 -  put myTSD.xml in SimpeServer/WebContent/WEB-INF/lib
 
 -  put the jar containing myTSD.xml in the 
 SimpleServer/WebContent/WEB-INF/lib
 
 -  put myTSD.xml in SimpleServer/WebContent/WEB-INF/resources
 
 Final question:
 When a CAS gets instantiated (or reset, as it does in Service.java), how can 
 I tell it to use a custom typesystem, and where will it look for that 
 typesystem.xml file within the SimpleServer project?
 
 Thank you,
 Helen Johnson
 
 
 
 



Re: CollectionProcessComplete Event thrown with Outstanding CAS Count

2012-06-20 Thread Thomas Ginter
Thanks Jerry.  BTW  will we be seeing a UIMA-AS 2.4.0 sometime soon?

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Jun 20, 2012, at 1:03 PM, Jaroslaw Cwiklik wrote:

 I've checked the code and indeed this is a bug in uima-as client when
 running with a CR. As soon as the CR
 returns false from hasNext() the uima-as client process() method calls
 collectionProcessComplete().
 The fix for this is to wait until all outstanding CASes are processed
 before calling collectionProcessComplete().
 I will fix the trunk in a day or two.
 
 To deal with this problem, you can run the CR outside of uima-as client and
 call either send() or sendAndReceive()
 methods to process your CASes. Alternatively, if you want to patch 2.3.1,
 you can modify process() method to:
 
 while (initialized  running ) {
  try {
 if ( (hasNext = collectionReader.hasNext()) == true) {
 cas = getCAS();
 collectionReader.getNext(cas);
 sendCAS(cas);
 } else {
   break;
 }
  } catch (Exception e) {
e.printStackTrace();
  }
}
Object waitMonitor = new Object();
if (hasNext == false ) {
while( running  clientCache.size()  0 ) {
try {
 // polling loop waiting for outstanding CASes to come back
 from the service
 synchronized(waitMonitor) {
 waitMonitor.wait(100);
 }
} catch( Exception exx ) { }
}
collectionProcessingComplete();
}
 
 Jerry
 
 On Thu, Jun 14, 2012 at 9:29 PM, Thomas Ginter thomas.gin...@utah.eduwrote:
 
 My UIMA-AS 2.3.1 service is returning the CollectionProcessComplete event
 while there are still CAS objects outstanding.  The client log shows:
 
 INFO: Client in CollecitonProcessComplete - OutstandingCasCount=2
 TotalCasRequestsSentBetweenCpCs=
 
 I always seem to end up losing 2 CAS objects becuase the
 UimaAsynchronousEngine object stops blocking the process() method when the
 CollectionProcessComplete event is returned.  My program then called the
 stop() method assuming the entire collection is finished processing.  This
 is a problem because the stop() method appears to be disconnecting from the
 service before the listener can process the last two CAS objects.
 
 Is there a setting I am missing to give the client more time to handle
 entityProcessComplete events?  What I have found in the documentation so
 far refers to input queues for remote delegates only.
 
 Thanks,
 
 Thomas Ginter
 801-448-7676
 thomas.gin...@utah.edu
 
 
 
 
 



Re: Exception thrown during CAS serialization for Remote UIMA-AS Service

2012-06-14 Thread Thomas Ginter
Jorn,

Thanks for the link to that section of documentation.  The mention of the 
XMLUtils class was just what I needed.  I wrote an XmlFilter class that uses 
XMLUtils to detect invalid XML characters and replace them with spaces so that 
our annotation offsets will still match the original text.  I was thinking 
about the issue all wrong.  I was assuming that all ASCII-8 characters are also 
valid XML-1.0 characters.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Jun 14, 2012, at 3:52 PM, Jörn Kottmann wrote:

 You write a string to the CAS which contains a non-xml character.
 This character cannot be serialized into XMI, and thats what this exception 
 is about.
 
 Have a look at our documentation explaining the issue:
 http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues
 
 Hope that helps,
 Jörn
 
 On 06/14/2012 11:39 PM, Thomas Ginter wrote:
 We are getting an odd error while trying to process large datasets using 
 UIMA-AS 2.3.1.  There is an exception thrown by the XmiCasSerializer in the 
 Client when it is in the process of serializing a CAS to be sent to a remote 
 service.  The exception is as follows:
 
 org.apache.uima.resource.ResourceProcessException
   at 
 org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:854)
   at 
 org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:885)
   at 
 org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.process(BaseUIMAAsynchronousEngineCommon_impl.java:734)
   at gov.va.vinci.flap.Client.run(Client.java:181)
   at gov.va.vinci.density.DensityClient.main(DensityClient.java:137)
 Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 
 character: _, 0x1a
   at 
 org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
   at 
 org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174)
   at 
 org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003)
   at 
 org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755)
   at 
 org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700)
   at 
 org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268)
   at 
 org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108)
   at 
 org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1539)
   at 
 org.apache.uima.aae.UimaSerializer.serializeCasToXmi(UimaSerializer.java:136)
   at 
 org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.serializeCAS(BaseUIMAAsynchronousEngineCommon_impl.java:260)
   at 
 org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:779)
   ... 4 more
 
 It happens at apparently random points when processing the corpus and is 
 never actually thrown but is simply written to StdErr.  Also the 
 serializer never seems to return which means the 
 UimaAsynchronoousEngine.process() method never returns and the client simply 
 hangs until it is manually terminated.  To resolve this issue I have 
 implemented text filters for the incoming CAS data to prevent anything out 
 of the ASCII-8 range.  I have also tried switching the server and client to 
 binary serialization strategies but that causes the XmiCasSerializer in my 
 UimaAsBaseListener object to return errors attempting to serialize CAS 
 objects revieved in the entityProcessingComplete event.
 
 Any suggestions from the UIMA masters?  How can I debug further so that I 
 can find out A: Where is this illegal character coming from and B: How can I 
 prevent it from happening?
 
 Thanks,
 
 Thomas Ginter
 801-448-7676
 thomas.gin...@utah.edumailto:thomas.gin...@utah.edu
 
 
 
 
 
 



CollectionProcessComplete Event thrown with Outstanding CAS Count

2012-06-14 Thread Thomas Ginter
My UIMA-AS 2.3.1 service is returning the CollectionProcessComplete event while 
there are still CAS objects outstanding.  The client log shows:

INFO: Client in CollecitonProcessComplete - OutstandingCasCount=2 
TotalCasRequestsSentBetweenCpCs=

I always seem to end up losing 2 CAS objects becuase the UimaAsynchronousEngine 
object stops blocking the process() method when the CollectionProcessComplete 
event is returned.  My program then called the stop() method assuming the 
entire collection is finished processing.  This is a problem because the stop() 
method appears to be disconnecting from the service before the listener can 
process the last two CAS objects. 

Is there a setting I am missing to give the client more time to handle 
entityProcessComplete events?  What I have found in the documentation so far 
refers to input queues for remote delegates only.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu






Re: Maven UIMA and import by name

2012-05-11 Thread Thomas Ginter
We use maven for our UIMA-AS projects.  Here is the build section from our 
standard POM entries:

build
resources
  resource
directorysrc/main/desc//directory
  /resource
  resource
directorysrc/main/resources//directory
  /resource
/resources
pluginManagement
plugins
 plugin
  groupIdorg.apache.maven.plugins/groupId
  artifactIdmaven-compiler-plugin/artifactId
  configuration
source1.6/source
target1.6/target
  /configuration
/plugin
  /plugins
/pluginManagement
   /build

This adds the desc and resources directories as source directories that allow 
you to resolve the import of descriptors by name.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On May 11, 2012, at 9:56 AM, Erik Fäßler wrote:

 Hello all,
 
 I have a question on how you deal with a specific use case and would like to 
 know if you have some suggestions for me.
 
 I use Maven for all my Java projects and so I do for my UIMA related 
 projects. Now I have a quite large pipeline with lots of descriptors. They 
 reside in (or subdirectories of) the 'desc' directory of the 'UIMA nature' 
 structure.
 Currently I am about to pack these single-AE descriptors into aggregates. For 
 importing all single-AEs into the AAE descriptor, I would like to use import 
 by name. However, the 'desc' directory is not a library for eclipse and 
 thus, the AAE descriptor editor doesn't list the descriptors residing in this 
 directory - I can't add them (and when I edit the XML, I get error messages 
 about descriptors not found).
 
 I would like to just add the 'desc' directory to the build path as an class 
 folder (not a source folder, this won't work), i.e. as a library. When I do 
 this manually, Maven would overwrite it the next time it updates my project 
 configuration.
 
 Have you any ideas here? Do you use 'import by name' for your PEARS? Do you 
 just live with the error messages and edit the XML directly?
 
 Just would like to know how you do it - and if anyone knows a way to tell 
 maven that 'desc' should be a library, I'd be glad :-)
 
 Best regards,
 
   Erik



Re: Running UIMA on a cluster

2012-04-27 Thread Thomas Ginter
UIMA-AS was created to handle the message passing, job distribution, etc.  Try 
going through the UIMA-AS documentation first.  We have had pretty good success 
using it here.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Apr 27, 2012, at 1:35 PM, John David Osborne wrote:

 Hello,
 
 Is there any best practice documentation out there for running
 UIMA/UIMA-AS on a cluster? I have only run single machine instances of
 UIMA (mostly through Eclipse) and have not investigated the ability to
 perform multiple simultaneous analyses in order to process large document
 collections.
 
 It's not clear to me how UIMA would operate in a cluster environment, do
 people really do message passing using JMI? I'm guessing this is the case
 as I seeing references to MPICH, SGE or other things I am more used to.
 I've looked through some of the documentation (including all the Overview
  SDK setup) but am not finding anything helpful. I've also tried googling
 but I am not getting much except this:
 http://comments.gmane.org/gmane.comp.apache.uima.general/2131 which makes
 me think it is possible.
 
 Currently with my level of confusion I think it may be best to have
 multiple instances of UIMA on a cluster and just submit jobs processing
 discrete document sets to our SGE cluster and ignore whatever scaling
 features are actually present in UIMA since the document processing I plan
 to do is data parallel.
 
 -John