Hi All, We've been attempting to scale our cTAKES Pipeline on top of Spark, so we've switched form using the "getDefaultPipeline" method to the "getFastPipeline" method to boost the processing speed. However, while the default pipeline works fine with Spark, the fast pipeline is throwing the below error (edited down to the cTAKES portion of the stack trace):
Caused by: org.apache.uima.resource.ResourceInitializationException: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle java.util.PropertyResourceBundle, key Could not construct org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:131) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266) ... 44 more Caused by: org.apache.uima.analysis_engine.annotator.AnnotatorContextException: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle java.util.PropertyResourceBundle, key Could not construct org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:199) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionaries(DictionaryDescriptorParser.java:156) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDescriptor(DictionaryDescriptorParser.java:128) at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:129) ... 45 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:196) ... 48 more Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -7 at java.lang.String.substring(String.java:1967) at org.apache.ctakes.dictionary.lookup2.util.JdbcConnectionFactory.getConnectionUrl(JdbcConnectionFactory.java:110) at org.apache.ctakes.dictionary.lookup2.util.JdbcConnectionFactory.getConnection(JdbcConnectionFactory.java:63) at org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary.<init>(JdbcRareWordDictionary.java:91) at org.apache.ctakes.dictionary.lookup2.dictionary.JdbcRareWordDictionary.<init>(JdbcRareWordDictionary.java:72) at org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary.<init>(UmlsJdbcRareWordDictionary.java:31) ... 53 more So, looking in "getConnectionUrl," we have this method: static private String getConnectionUrl( final String jdbcUrl ) throws SQLException { final String urlDbPath = jdbcUrl.substring( HSQL_FILE_PREFIX.length() ); final String urlFilePath = urlDbPath + HSQL_DB_EXT; try { final URL url = FileLocator.getResource( urlFilePath ); final String urlString = url.toExternalForm(); return urlString.substring( 0, urlString.length() - HSQL_DB_EXT.length() ); // <--- } catch ( FileNotFoundException fnfE ) { throw new SQLException( "No Hsql DB exists at Url", fnfE ); } The substring method indicated above appears to be what is causing the error - for some reason the "urlString" variable has a length of zero. This seems to indicate that there is something wrong with the cTAKES resources. However, that isn't making much sense to me as the default pipeline, which also relies on the resources package, is working fine. Has anyone encountered something like this before? Does the fast pipeline require some additional resources? As well, for the Spark implementation, we've put the cTAKES jars and resources on each executor at the same location, and are specifying this in on the executor classpath. Thanks, Mike -- [image: MetiStream Logo - 500] Mike Trepanier| Big Data Engineer | MetiStream, Inc. | m...@metistream.com | 845 - 270 - 3129 (m) | www.metistream.com