We're using one of the cTAKES 4.0 convenience binaries. The two wgets from
my install script are shown below (mirroring what's shown in the install):


wget -P /usr/local
http://mirrors.sonic.net/apache//ctakes/ctakes-4.0.0/apache-ctakes-4.0.0-bin.tar.gz
;
wget -P /usr/local
http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-4.0-bin.zip


I'm wondering if this is now tied to the serializability of part of the
fast pipeline (as opposed to the default). We're not using the maven
dependency due some issues with lvgannotator outlined here:
https://issues.apache.org/jira/browse/CTAKES-445

However, there appears to be a new patch as of three hours ago, so I need
to do some investigating there.

Mike


On Fri, Sep 1, 2017 at 6:56 PM, James Masanz <masanz.ja...@gmail.com> wrote:

> I think that in late April Sean Finan fixed a problem that was resulting
> in
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range:
> -7
>
> Are you using cTAKES 4.0 (either from the convenience binary download or
> as a maven dependency) or are you using cTAKES in some other way
>
> -- James
>
>
> On Fri, Sep 1, 2017 at 3:13 PM, Michael Trepanier <m...@metistream.com>
> wrote:
>
>> Hi All,
>>
>> We've been attempting to scale our cTAKES Pipeline on top of Spark, so
>> we've switched form using the "getDefaultPipeline" method to the
>> "getFastPipeline" method to boost the processing speed. However, while the
>> default pipeline works fine with Spark, the fast pipeline is throwing the
>> below error (edited down to the cTAKES portion of the stack trace):
>>
>>
>> Caused by: org.apache.uima.resource.ResourceInitializationException:
>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
>> java.util.PropertyResourceBundle, key Could not construct
>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRare
>> WordDictionary
>>         at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnno
>> tator.initialize(AbstractJCasTermAnnotator.java:131)
>>         at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine
>> _impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>>         ... 44 more
>> Caused by: 
>> org.apache.uima.analysis_engine.annotator.AnnotatorContextException:
>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
>> java.util.PropertyResourceBundle, key Could not construct
>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRare
>> WordDictionary
>>         at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDe
>> scriptorParser.parseDictionary(DictionaryDescriptorParser.java:199)
>>         at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDe
>> scriptorParser.parseDictionaries(DictionaryDescriptorParser.java:156)
>>         at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDe
>> scriptorParser.parseDescriptor(DictionaryDescriptorParser.java:128)
>>         at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnno
>> tator.initialize(AbstractJCasTermAnnotator.java:129)
>>         ... 45 more
>> Caused by: java.lang.reflect.InvocationTargetException
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>> ConstructorAccessorImpl.java:62)
>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>> legatingConstructorAccessorImpl.java:45)
>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>> 23)
>>         at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDe
>> scriptorParser.parseDictionary(DictionaryDescriptorParser.java:196)
>>         ... 48 more
>> Caused by: java.lang.StringIndexOutOfBoundsException: String index out
>> of range: -7
>>         at java.lang.String.substring(String.java:1967)
>>         at org.apache.ctakes.dictionary.lookup2.util.JdbcConnectionFact
>> ory.getConnectionUrl(JdbcConnectionFactory.java:110)
>>         at org.apache.ctakes.dictionary.lookup2.util.JdbcConnectionFact
>> ory.getConnection(JdbcConnectionFactory.java:63)
>>         at org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWord
>> Dictionary.<init>(JdbcRareWordDictionary.java:91)
>>         at org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWord
>> Dictionary.<init>(JdbcRareWordDictionary.java:72)
>>         at org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRare
>> WordDictionary.<init>(UmlsJdbcRareWordDictionary.java:31)
>>         ... 53 more
>>
>>
>> So, looking in "getConnectionUrl," we have this method:
>>
>> static private String getConnectionUrl( final String jdbcUrl ) throws
>> SQLException {
>>       final String urlDbPath = jdbcUrl.substring(
>> HSQL_FILE_PREFIX.length() );
>>       final String urlFilePath = urlDbPath + HSQL_DB_EXT;
>>       try {
>>          final URL url = FileLocator.getResource( urlFilePath );
>>          final String urlString = url.toExternalForm();
>>          return urlString.substring( 0, urlString.length() -
>> HSQL_DB_EXT.length() ); // <---
>>       } catch ( FileNotFoundException fnfE ) {
>>          throw new SQLException( "No Hsql DB exists at Url", fnfE );
>>       }
>>
>> The substring method indicated above appears to be what is causing the
>> error - for some reason the "urlString" variable has a length of zero. This
>> seems to indicate that there is something wrong with the cTAKES resources.
>> However, that isn't making much sense to me as the default pipeline, which
>> also relies on the resources package, is working fine. Has anyone
>> encountered something like this before? Does the fast pipeline require some
>> additional resources?
>>
>> As well, for the Spark implementation, we've put the cTAKES jars and
>> resources on each executor at the same location, and are specifying this in
>> on the executor classpath.
>>
>> Thanks,
>>
>> Mike
>> --
>> [image: MetiStream Logo - 500]
>> Mike Trepanier| Big Data Engineer | MetiStream, Inc. |
>> m...@metistream.com | 845 - 270 - 3129 <(845)%20270-3129> (m) |
>> www.metistream.com
>>
>
>


-- 
[image: MetiStream Logo - 500]
Mike Trepanier| Big Data Engineer | MetiStream, Inc. |  m...@metistream.com |
845 - 270 - 3129 (m) | www.metistream.com

Reply via email to