Hi All,

We've been attempting to scale our cTAKES Pipeline on top of Spark, so
we've switched form using the "getDefaultPipeline" method to the
"getFastPipeline" method to boost the processing speed. However, while the
default pipeline works fine with Spark, the fast pipeline is throwing the
below error (edited down to the cTAKES portion of the stack trace):


Caused by: org.apache.uima.resource.ResourceInitializationException:
MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
java.util.PropertyResourceBundle, key Could not construct
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary
        at
org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:131)
        at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
        ... 44 more
Caused by:
org.apache.uima.analysis_engine.annotator.AnnotatorContextException:
MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
java.util.PropertyResourceBundle, key Could not construct
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary
        at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:199)
        at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionaries(DictionaryDescriptorParser.java:156)
        at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDescriptor(DictionaryDescriptorParser.java:128)
        at
org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:129)
        ... 45 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:196)
        ... 48 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
range: -7
        at java.lang.String.substring(String.java:1967)
        at
org.apache.ctakes.dictionary.lookup2.util.JdbcConnectionFactory.getConnectionUrl(JdbcConnectionFactory.java:110)
        at
org.apache.ctakes.dictionary.lookup2.util.JdbcConnectionFactory.getConnection(JdbcConnectionFactory.java:63)
        at
org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary.<init>(JdbcRareWordDictionary.java:91)
        at
org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary.<init>(JdbcRareWordDictionary.java:72)
        at
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary.<init>(UmlsJdbcRareWordDictionary.java:31)
        ... 53 more


So, looking in "getConnectionUrl," we have this method:

static private String getConnectionUrl( final String jdbcUrl ) throws
SQLException {
      final String urlDbPath = jdbcUrl.substring( HSQL_FILE_PREFIX.length()
);
      final String urlFilePath = urlDbPath + HSQL_DB_EXT;
      try {
         final URL url = FileLocator.getResource( urlFilePath );
         final String urlString = url.toExternalForm();
         return urlString.substring( 0, urlString.length() -
HSQL_DB_EXT.length() ); // <---
      } catch ( FileNotFoundException fnfE ) {
         throw new SQLException( "No Hsql DB exists at Url", fnfE );
      }

The substring method indicated above appears to be what is causing the
error - for some reason the "urlString" variable has a length of zero. This
seems to indicate that there is something wrong with the cTAKES resources.
However, that isn't making much sense to me as the default pipeline, which
also relies on the resources package, is working fine. Has anyone
encountered something like this before? Does the fast pipeline require some
additional resources?

As well, for the Spark implementation, we've put the cTAKES jars and
resources on each executor at the same location, and are specifying this in
on the executor classpath.

Thanks,

Mike
-- 
[image: MetiStream Logo - 500]
Mike Trepanier| Big Data Engineer | MetiStream, Inc. |  m...@metistream.com |
845 - 270 - 3129 (m) | www.metistream.com

Reply via email to