Hi James, There are many ways to deploy cTAKES for processing a large number of documents- it really depends on the application. However, if you plan to run cTAKES in it's own process/JVM, you may want to take a look at UIMA-AS[1] as it provides an extremely flexible architecture via async messaging.
[1] http://uima.apache.org/d/uima-as-2.4.0/uima_async_scaleout.html > -----Original Message----- > From: Vogel, James [mailto:[email protected]] > Sent: Friday, September 06, 2013 9:44 AM > To: [email protected] > Subject: RE: Package for use with Solr > > What is the recommended way to deploy cTAKES for processing of a large > number of documents on a regular basis while dealing with the single thread > restriction? > > Can anyone point me to any examples of using SolrCas with cTAKES? > > -----Original Message----- > From: Pei Chen [mailto:[email protected]] > Sent: Thursday, September 05, 2013 2:40 PM > To: [email protected] > Subject: Re: Package for use with Solr > > James, > Also- If you plan to run this in the same JVM process as Solr or other > multithreaded webapp setup- be sure to look out for Jira CTAKES-151 [1] > https://issues.apache.org/jira/browse/CTAKES-151 > > On Thu, Sep 5, 2013 at 10:38 AM, Pei Chen <[email protected]> wrote: > > James, > > Ensure that dir is in your classpath. To test out the theory, try > > making that path an {absolute_path} instead of res. > > > > Note, if you're trying to run this under a servlet container, the > > webapp could have a different class loader than the container. > > > > On Thu, Sep 5, 2013 at 9:37 AM, Vogel, James <[email protected]> > wrote: > >> hsqldb is throwing java.sql.SQLException: File input/output error: > >> java.io.IOException: Stream closed at the following code in > >> JdbcConnectionResourceImpl. load: > >> > >> > >> > >> iv_conn = DriverManager.getConnection( > >> > >> urlStr, > >> > >> username, > >> > >> password); > >> > >> > >> > >> urlStr is: > >> jdbc:hsqldb:res:org/apache/ctakes/dictionary/lookup/umls2011ab/umls > >> > >> username is: SA > >> > >> password is: blank > >> > >> > >> > >> That path is expanded as regular files along with the other resources > >> off of my default directory and on the classpath. Is there something > >> special about the placement of the hsqldb files or something I need > >> to do to enable use of hsqldb? > >> > >> > >> > >> Exception details: > >> > >> > >> > >> Caused by: org.apache.uima.resource.ResourceInitializationException > >> > >> at > >> > org.apache.ctakes.core.resource.JdbcConnectionResourceImpl.load(JdbcC > >> onnectionResourceImpl.java:130) > >> > >> at > >> > org.apache.uima.resource.impl.ResourceManager_impl.registerResource(R > >> esourceManager_impl.java:611) > >> > >> at > >> org.apache.uima.resource.impl.ResourceManager_impl.initializeExternal > >> Resources(ResourceManager_impl.java:450) > >> > >> at > >> org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBa > >> se.java:182) > >> > >> at > >> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initializ > >> e(AnalysisEngineImplBase.java:157) > >> > >> at > >> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini > >> tialize(PrimitiveAnalysisEngine_impl.java:123) > >> > >> at > >> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy > >> sisEngineFactory_impl.java:94) > >> > >> at > >> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(C > o > >> mpositeResourceFactory_impl.java:62) > >> > >> at > >> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java: > 269) > >> > >> at > >> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework > .jav > >> a:387) > >> > >> at > >> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java > >> :255) > >> > >> at > >> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini > >> tASB(AggregateAnalysisEngine_impl.java:429) > >> > >> at > >> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini > >> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373) > >> > >> at > >> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini > >> tialize(AggregateAnalysisEngine_impl.java:186) > >> > >> at > >> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy > >> sisEngineFactory_impl.java:94) > >> > >> at > >> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(C > o > >> mpositeResourceFactory_impl.java:62) > >> > >> at > >> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java: > 269) > >> > >> at > >> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework > .jav > >> a:354) > >> > >> at > >> org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvi > >> der.java:73) > >> > >> at > >> > org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText > >> (UIMAUpdateRequestProcessor.java:155) > >> > >> at > >> > org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd( > >> UIMAUpdateRequestProcessor.java:80) > >> > >> ... 40 more > >> > >> Caused by: java.sql.SQLException: File input/output error: > >> java.io.IOException: Stream closed > >> > >> at org.hsqldb.jdbc.Util.sqlException(Unknown Source) > >> > >> at org.hsqldb.jdbc.jdbcConnection.<init>(Unknown > >> Source) > >> > >> at org.hsqldb.jdbcDriver.getConnection(Unknown > >> Source) > >> > >> at org.hsqldb.jdbcDriver.connect(Unknown Source) > >> > >> at java.sql.DriverManager.getConnection(Unknown > >> Source) > >> > >> at java.sql.DriverManager.getConnection(Unknown > >> Source) > >> > >> at > >> > org.apache.ctakes.core.resource.JdbcConnectionResourceImpl.load(JdbcC > >> onnectionResourceImpl.java:109) > >> > >> ... 60 more > >> > >> > >> > >> From: Chen, Pei [mailto:[email protected]] > >> > >> Sent: Thursday, August 29, 2013 5:57 PM > >> To: <[email protected]> > >> Cc: [email protected] > >> > >> Subject: Re: Package for use with Solr > >> > >> > >> > >> Sorry if i was unclear- i meant Only resources need to be unpacked. > >> What was the full stack trace? > >> > >> > >> Sent from my iPhone > >> > >> > >> On Aug 29, 2013, at 5:52 PM, "Vogel, James" <[email protected]> > wrote: > >> > >> I do, including ctakes-type-system-3.0.0-incubating.jar. I thought > >> you said that the xml files needed to be unpacked rather than just in > >> jars so I was assuming the reason it wasn't being found had to do with > that. > >> > >> > >> > >> From: Chen, Pei [mailto:[email protected]] > >> Sent: Thursday, August 29, 2013 5:43 PM > >> To: [email protected] > >> Subject: RE: Package for use with Solr > >> > >> > >> > >> James, > >> > >> Do you have all of the ctakes jars in your classpath? > >> > >> i.e: > >> > >> ctakes-type-system-{version}.jar in your classpath? > >> > >> > >> > >> --Pei > >> > >> > >> > >> > >> > >> From: Vogel, James [mailto:[email protected]] > >> Sent: Thursday, August 29, 2013 5:37 PM > >> To: [email protected] > >> Subject: RE: Package for use with Solr > >> > >> > >> > >> Is there something special about where > >> org\apache\ctakes\typesystem\types\TypeSystem.xml needs to be > placed > >> relative to the other files in the binary distribution? I put a 'resources' > >> folder on the path containing > >> org\apache\ctakes\typesystem\types\TypeSystem.xml but it isn't being > found. > >> > >> > >> > >> From: Pei Chen [mailto:[email protected]] > >> Sent: Wednesday, August 28, 2013 1:43 PM > >> To: [email protected] > >> Subject: Re: Package for use with Solr > >> > >> > >> > >> There is a current limitation where the resources need to be unpacked... > >> (Lucene doesn't like the indexes being inside the compressed jar's.) > >> Try unpacking the resources and adding resources to your classpath... > >> > >> --Pei > >> > >> > >> > >> On Wed, Aug 28, 2013 at 12:59 PM, Vogel, James > >> <[email protected]> > >> wrote: > >> > >> I renamed apache-ctakes-3.1.0-SNAPSHOT-bin.zip to jar and created a > >> jar containing the contents of ctakes-dictionary-lookup\resources and > >> put them on the classpath. When I don't add the resources jar I get > >> an error because it can't find those files. Once I get past that, I > >> get a > >> java.lang.IllegalArgumentException: URI is not hierarchical > >> exception, full stack trace below. I saw some posts from 2012 > >> (http://ctakes.markmail.org/search/?q=URI+is+not+hierarchical#query:U > >> > RI%20is%20not%20hierarchical+page:1+mid:yaqtqzbylwdyy35n+state:result > >> s) where you wrote about a similar problem. Any suggestions on why > >> this happens? > >> > >> > >> > >> 2013-08-28 16:41:51,692 ERROR core.SolrCore - > >> org.apache.solr.common.SolrException: processing error URI is not > >> hierarchical. > >> > id=file:/C:/Users/jvogel/Documents/data/uima_tests/drug%20name%20test > >> %20v1.txt, text="George Washington was aspirin exposed to butenafine > >> hydrochloride which is a more complex drug refer..." > >> > >> at > >> > org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd( > >> UIMAUpdateRequestProcessor.java:118) > >> > >> at > >> > org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(Up > >> dateRequestProcessor.java:51) > >> > >> at > >> > com.lucid.update.FieldMappingProcessor.processAdd(FieldMappingUpdate > P > >> rocessorFactory.java:98) > >> > >> at > >> > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpd > >> ateProcessorFactory.java:100) > >> > >> at > >> > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java > >> :246) > >> > >> at > >> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) > >> > >> at > >> > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHan > d > >> ler.java:92) > >> > >> at > >> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co > >> ntentStreamHandlerBase.java:74) > >> > >> at > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl > >> erBase.java:135) > >> > >> at > >> org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) > >> > >> at > >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter > >> .java:448) > >> > >> at > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte > >> r.java:269) > >> > >> at > >> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition. > >> java:163) > >> > >> at > >> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainI > >> nvocation.java:58) > >> > >> at > >> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilte > >> rPipeline.java:118) > >> > >> at > >> com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > >> > >> at > >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet > >> Handler.java:1337) > >> > >> at > >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java > >> :484) > >> > >> at > >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j > >> ava:119) > >> > >> at > >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.jav > >> a:524) > >> > >> at > >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl > >> er.java:233) > >> > >> at > >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl > >> er.java:1065) > >> > >> at > >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java: > >> 413) > >> > >> at > >> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle > >> r.java:192) > >> > >> at > >> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle > >> r.java:999) > >> > >> at > >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j > >> ava:117) > >> > >> at > >> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont > >> extHandlerCollection.java:250) > >> > >> at > >> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl > >> ection.java:149) > >> > >> at > >> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper > >> .java:111) > >> > >> at > >> org.eclipse.jetty.server.Server.handle(Server.java:351) > >> > >> at > >> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac > >> tHttpConnection.java:454) > >> > >> at > >> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin > >> gHttpConnection.java:47) > >> > >> at > >> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpC > >> onnection.java:900) > >> > >> at > >> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.conten > >> t(AbstractHttpConnection.java:954) > >> > >> at > >> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:952) > >> > >> at > >> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) > >> > >> at > >> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo > >> nnection.java:66) > >> > >> at > >> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So > >> cketConnector.java:254) > >> > >> at > >> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo > >> l.java:599) > >> > >> at > >> > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool > >> .java:534) > >> > >> at java.lang.Thread.run(Unknown Source) > >> > >> Caused by: java.lang.IllegalArgumentException: URI is not > >> hierarchical > >> > >> at java.io.File.<init>(Unknown Source) > >> > >> at > >> org.apache.ctakes.core.resource.FileResourceImpl.load(FileResourceImp > >> l.java:44) > >> > >> at > >> > org.apache.uima.resource.impl.ResourceManager_impl.registerResource(R > >> esourceManager_impl.java:603) > >> > >> at > >> org.apache.uima.resource.impl.ResourceManager_impl.initializeExternal > >> Resources(ResourceManager_impl.java:442) > >> > >> at > >> org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBa > >> se.java:146) > >> > >> at > >> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initializ > >> e(AnalysisEngineImplBase.java:157) > >> > >> at > >> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini > >> tialize(PrimitiveAnalysisEngine_impl.java:122) > >> > >> at > >> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy > >> sisEngineFactory_impl.java:94) > >> > >> at > >> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(C > o > >> mpositeResourceFactory_impl.java:62) > >> > >> at > >> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java: > 267) > >> > >> at > >> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework > .jav > >> a:361) > >> > >> at > >> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java > >> :254) > >> > >> at > >> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini > >> tASB(AggregateAnalysisEngine_impl.java:431) > >> > >> at > >> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini > >> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375) > >> > >> at > >> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini > >> tialize(AggregateAnalysisEngine_impl.java:185) > >> > >> at > >> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy > >> sisEngineFactory_impl.java:94) > >> > >> at > >> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(C > o > >> mpositeResourceFactory_impl.java:62) > >> > >> at > >> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java: > 267) > >> > >> at > >> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework > .jav > >> a:335) > >> > >> at > >> org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvi > >> der.java:73) > >> > >> at > >> > org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText > >> (UIMAUpdateRequestProcessor.java:155) > >> > >> at > >> > org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd( > >> UIMAUpdateRequestProcessor.java:80) > >> > >> ... 40 more > >> > >> > >> > >> From: Pei Chen [mailto:[email protected]] > >> Sent: Wednesday, August 28, 2013 11:43 AM > >> To: [email protected] > >> Subject: Re: Package for use with Solr > >> > >> > >> > >> One can download the binaries which has all of the jars and > >> resources. Of if you build from source, you can run $mvn package > >> which will generate a convenience zip with all of the jars and > >> transitive dependencies and resources neatly packaged inside ctakes- > dist/target/. > >> > >> > >> > >> Hope that helps- > >> > >> Pei > >> > >> > >> > >> On Tue, Aug 27, 2013 at 6:07 PM, Vogel, James > >> <[email protected]> > >> wrote: > >> > >> I'm not yet planning on SolrCAS because I'm not familiar with it yet. > >> First I just want to use the UIMAUpdateRequestProcessor to map the > >> results from specific fields into solr. I'm currently stuck on how > >> to create jar(s) that contain all the things under the *-res folders > >> to work around the following > >> error: > >> > >> Caused by: java.io.FileNotFoundException: > >> org\apache\ctakes\dependency\parser\models\lemmatizer\dictionary- > 1.3. > >> 1.jar (The system cannot find the path specified) > >> > >> > >> > >> I'm not familiar with maven. The jars I created via the maven > >> assembly command only contain the classes. Is there a maven command > >> to create a ctakes jar(s) that contains all the components (classes, > >> xmls, dictionaries, etc.)? > >> > >> > >> > >> From: Chen, Pei [mailto:[email protected]] > >> Sent: Tuesday, August 27, 2013 9:58 AM > >> To: [email protected] > >> Subject: RE: Package for use with Solr > >> > >> > >> > >> Hi James, > >> > >> I believe the process of deploying a primitive annotator should be > >> the same as an aggregate pipeline that contains a fixed flow... > >> > >> I presume you would like to append something like the SolrCAS cas > >> consumer at the end of the pipeline? > >> > >> --Pei > >> > >> > >> > >> From: Vogel, James [mailto:[email protected]] > >> Sent: Tuesday, August 27, 2013 7:09 AM > >> To: [email protected] > >> Subject: Package for use with Solr > >> > >> > >> > >> Any guidance on how to package all the components needed for the > >> ctakes clinical pipeline AggregatePlaintextUMLSProcessor so that it > >> can be deployed as part of another application? I'd like to deploy > >> it to be run by the UIMAUpdateRequestProcessor as part of solr > >> document indexing. I know how to package a single UIMA annotator by > >> just referencing the .xml file and putting the jar on the class path > >> for Solr. I don't know how to do the same for all of the components in the > pipeline. > >> > >> > >> > >> ________________________________ > >> > >> IMPORTANT WARNING: Information contained in this email is intended > >> for the use of the individual to whom it is addressed, and may > >> contain information that is privileged, confidential, and exempt from > >> disclosure under applicable law. If you are not the intended > >> recipient, or the employee or agent responsible for delivering the > >> message to the intended recipient, you are hereby notified that any > >> dissemination, distribution, or copying of this communication is > >> STRICTLY FORBIDDEN. If you have received this communication in error, > >> please notify us immediately by return email and delete this document. > Thank you. > >> > >> > >> > >> ________________________________ > >> > >> IMPORTANT WARNING: Information contained in this email is intended > >> for the use of the individual to whom it is addressed, and may > >> contain information that is privileged, confidential, and exempt from > >> disclosure under applicable law. If you are not the intended > >> recipient, or the employee or agent responsible for delivering the > >> message to the intended recipient, you are hereby notified that any > >> dissemination, distribution, or copying of this communication is > >> STRICTLY FORBIDDEN. If you have received this communication in error, > >> please notify us immediately by return email and delete this document. > Thank you. > >> > >> > >> > >> > >> > >> ________________________________ > >> > >> IMPORTANT WARNING: Information contained in this email is intended > >> for the use of the individual to whom it is addressed, and may > >> contain information that is privileged, confidential, and exempt from > >> disclosure under applicable law. If you are not the intended > >> recipient, or the employee or agent responsible for delivering the > >> message to the intended recipient, you are hereby notified that any > >> dissemination, distribution, or copying of this communication is > >> STRICTLY FORBIDDEN. If you have received this communication in error, > >> please notify us immediately by return email and delete this document. > Thank you. > >> > >> > >> > >> > >> > >> ________________________________ > >> > >> IMPORTANT WARNING: Information contained in this email is intended > >> for the use of the individual to whom it is addressed, and may > >> contain information that is privileged, confidential, and exempt from > >> disclosure under applicable law. If you are not the intended > >> recipient, or the employee or agent responsible for delivering the > >> message to the intended recipient, you are hereby notified that any > >> dissemination, distribution, or copying of this communication is > >> STRICTLY FORBIDDEN. If you have received this communication in error, > >> please notify us immediately by return email and delete this document. > Thank you. > >> > >> > >> > >> ________________________________ > >> > >> IMPORTANT WARNING: Information contained in this email is intended > >> for the use of the individual to whom it is addressed, and may > >> contain information that is privileged, confidential, and exempt from > >> disclosure under applicable law. If you are not the intended > >> recipient, or the employee or agent responsible for delivering the > >> message to the intended recipient, you are hereby notified that any > >> dissemination, distribution, or copying of this communication is > >> STRICTLY FORBIDDEN. If you have received this communication in error, > >> please notify us immediately by return email and delete this document. > Thank you. > >> > >> > >> ________________________________ > >> IMPORTANT WARNING: Information contained in this email is intended > >> for the use of the individual to whom it is addressed, and may > >> contain information that is privileged, confidential, and exempt from > >> disclosure under applicable law. If you are not the intended > >> recipient, or the employee or agent responsible for delivering the > >> message to the intended recipient, you are hereby notified that any > >> dissemination, distribution, or copying of this communication is > >> STRICTLY FORBIDDEN. If you have received this communication in error, > >> please notify us immediately by return email and delete this document. > Thank you. > > IMPORTANT WARNING: Information contained in this email is intended for > the use of the individual to whom it is addressed, and may contain > information that is privileged, confidential, and exempt from disclosure > under applicable law. If you are not the intended recipient, or the employee > or agent responsible for delivering the message to the intended recipient, > you are hereby notified that any dissemination, distribution, or copying of > this communication is STRICTLY FORBIDDEN. If you have received this > communication in error, please notify us immediately by return email and > delete this document. Thank you.
