Hi Mike, Can you tell me what version of MCF you are using? Thanks, Karl
On Wed, Aug 19, 2015 at 9:56 AM, Karl Wright <[email protected]> wrote: > I've created a ticket, CONNECTORS-1229. Will be looking at this shortly. > > Karl > > > On Wed, Aug 19, 2015 at 8:21 AM, Mike Caceres <[email protected]> > wrote: > >> Thank you for the examples Karl. >> >> However, when I include this definition in the job definition and then >> run the job, it seems like ManifoldCF enters in some kind of loop in the >> running state. Looking at the manifoldcf.log file I see many times this >> kind of entries: >> >> >>>>>> >> >> FATAL 2015-08-19 07:51:48,231 (Worker thread '70') - >> org.apache.manifoldcf.crawlerthreads - Error tossed: null >> java.lang.NullPointerException >> at >> org.apache.manifoldcf.agents.transformation.forcedmetadata.ForcedMetadataConnector.append(ForcedMetadataConnector.java:646) >> at >> org.apache.manifoldcf.agents.transformation.forcedmetadata.ForcedMetadataConnector.processExpression(ForcedMetadataConnector.java:678) >> at >> org.apache.manifoldcf.agents.transformation.forcedmetadata.ForcedMetadataConnector.addOrReplaceDocumentWithException(ForcedMetadataConnector.java:134) >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3221) >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3072) >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2706) >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) >> at >> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1503) >> at >> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1468) >> at >> org.apache.manifoldcf.crawler.connectors.DCTM.DCTM.processDocuments(DCTM.java:1813) >> at >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:379) >> >> <<<<<< >> >> Which may or may not be related to this earlier messages in the same log >> file: >> >> >>>>>> >> INFO 2015-08-19 07:47:47,307 (main) - org.apache.manifoldcf.root - >> Synchronization storage cleaned up >> INFO 2015-08-19 07:48:07,830 (main) - org.apache.manifoldcf.root - >> Running... >> INFO 2015-08-19 07:48:07,846 (main) - org.apache.manifoldcf.root - >> Running... >> INFO 2015-08-19 07:48:07,994 (Agents thread) - >> org.apache.manifoldcf.jobs - Cleaning up all process data >> INFO 2015-08-19 07:48:08,036 (Agents thread) - >> org.apache.manifoldcf.jobs - Cleanup complete >> INFO 2015-08-19 07:48:08,064 (Agents thread) - >> org.apache.manifoldcf.jobs - Starting cluster >> INFO 2015-08-19 07:48:08,072 (Agents thread) - >> org.apache.manifoldcf.jobs - Cluster start complete >> INFO 2015-08-19 07:48:08,075 (Agents thread) - >> org.apache.manifoldcf.root - Starting up pull-agent... >> INFO 2015-08-19 07:48:08,088 (Agents thread) - >> org.apache.manifoldcf.root - Starting up pull-agent... >> INFO 2015-08-19 07:48:08,133 (Agents thread) - >> org.apache.manifoldcf.root - Pull-agent started >> INFO 2015-08-19 07:48:08,182 (Agents thread) - >> org.apache.manifoldcf.root - Pull-agent started >> ERROR 2015-08-19 07:48:44,184 (qtp858007949-11) - >> org.apache.manifoldcf.misc - Missing resource >> 'ForcedMetadata.ForcedMetadataNameMustNotBeNull' in bundle >> 'org.apache.manifoldcf.agents.transformation.forcedmetadata.common' for >> locale 'en_US' >> java.util.MissingResourceException: Can't find resource for bundle >> java.util.PropertyResourceBundle, key >> ForcedMetadata.ForcedMetadataNameMustNotBeNull >> at java.util.ResourceBundle.getObject(ResourceBundle.java:395) >> at java.util.ResourceBundle.getString(ResourceBundle.java:355) >> at >> org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:193) >> at >> org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:240) >> at >> org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:208) >> at >> org.apache.manifoldcf.ui.i18n.ResourceBundleWrapper.getString(ResourceBundleWrapper.java:44) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at ....... >> <<<<<< >> >> if I edit the job definition and remove the regular expression and save >> the job, then almost immediately I can see this entries in the log: >> >> >>>>>> >> INFO 2015-08-19 07:52:28,300 (Finisher thread) - >> org.apache.manifoldcf.jobs - Marked job 1439951495926 for shutdown >> INFO 2015-08-19 07:52:28,434 (Job reset thread) - >> org.apache.manifoldcf.jobs - Job 1439951495926 now completed >> INFO 2015-08-19 07:52:38,332 (Job notification thread) - >> org.apache.manifoldcf.jobs - Found job 1439951495926 in need of >> notification >> <<<<<< >> >> Thank you, >> >> Mike >> ------------------------------ >> Date: Wed, 19 Aug 2015 03:45:30 -0400 >> Subject: Re: Metadata expressions >> From: [email protected] >> To: [email protected] >> >> >> Hi Mike, >> >> The documentation (which seems not to have updated on the site yet) says >> the following: >> >> >>>>>> >> <p>You can also use regular expressions in the >> substitution string, for example: "${there|[0-9]*}", which will extract the >> first sequence of sequential numbers it finds in the >> value of the field "there", or >> "${there|string(.*)|1}", which will include everything following "string" >> in the field value. (The third argument specifies the regular >> expression group number, with an optional suffix of >> "l" or "u" meaning upper-case or lower-case.)</p> >> <p>Enter a parameter name, and either select to remove >> the value or provide an expression. If you chose to supply an expression, >> enter the expression in the box. >> <<<<<< >> >> To evaluate your regular expression with the specific input you gave, I >> typically use a regex applet, if you can find a browser that still allows >> it: >> >> http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html >> >> Dropping your stuff in and clicking the "find()" button yields this: >> "Pattern did not match" >> >> So your regex is not correct. But, "Protocol (\d+)" does match, with the >> following group outputs: >> >> start() = 0, end() = 16 >> group(0) = "Protocol 1234500" >> group(1) = "1234500" >> >> So you want group 1. Therefore, the MCF expression would be: >> >> expression = Protocol-${protocol_name|Protocol (\d+)|1} >> >> Thanks, >> Karl >> >> >> >> On Tue, Aug 18, 2015 at 11:19 PM, Mike Caceres <[email protected]> >> wrote: >> >> If I have a document with the following metadata values: >> "protocol_name" : "Protocol 1234500 (USPA00012345) second version" >> >> and I want to produce a new metadata field that looks like this: >> >> "protocol_id" : "Protocol-1234500" >> >> should the metadata expression look like this? >> >> parameter name = protocol_id >> remove this parameter = false >> expression = Protocol-${protocol_name|string(\d+)|0} >> >> Thank you! >> >> >> >
