I've created a ticket, CONNECTORS-1229. Will be looking at this shortly. Karl
On Wed, Aug 19, 2015 at 8:21 AM, Mike Caceres <[email protected]> wrote: > Thank you for the examples Karl. > > However, when I include this definition in the job definition and then run > the job, it seems like ManifoldCF enters in some kind of loop in the > running state. Looking at the manifoldcf.log file I see many times this > kind of entries: > > >>>>>> > > FATAL 2015-08-19 07:51:48,231 (Worker thread '70') - > org.apache.manifoldcf.crawlerthreads - Error tossed: null > java.lang.NullPointerException > at > org.apache.manifoldcf.agents.transformation.forcedmetadata.ForcedMetadataConnector.append(ForcedMetadataConnector.java:646) > at > org.apache.manifoldcf.agents.transformation.forcedmetadata.ForcedMetadataConnector.processExpression(ForcedMetadataConnector.java:678) > at > org.apache.manifoldcf.agents.transformation.forcedmetadata.ForcedMetadataConnector.addOrReplaceDocumentWithException(ForcedMetadataConnector.java:134) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3221) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3072) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2706) > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1503) > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1468) > at > org.apache.manifoldcf.crawler.connectors.DCTM.DCTM.processDocuments(DCTM.java:1813) > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:379) > > <<<<<< > > Which may or may not be related to this earlier messages in the same log > file: > > >>>>>> > INFO 2015-08-19 07:47:47,307 (main) - org.apache.manifoldcf.root - > Synchronization storage cleaned up > INFO 2015-08-19 07:48:07,830 (main) - org.apache.manifoldcf.root - > Running... > INFO 2015-08-19 07:48:07,846 (main) - org.apache.manifoldcf.root - > Running... > INFO 2015-08-19 07:48:07,994 (Agents thread) - org.apache.manifoldcf.jobs > - Cleaning up all process data > INFO 2015-08-19 07:48:08,036 (Agents thread) - org.apache.manifoldcf.jobs > - Cleanup complete > INFO 2015-08-19 07:48:08,064 (Agents thread) - org.apache.manifoldcf.jobs > - Starting cluster > INFO 2015-08-19 07:48:08,072 (Agents thread) - org.apache.manifoldcf.jobs > - Cluster start complete > INFO 2015-08-19 07:48:08,075 (Agents thread) - org.apache.manifoldcf.root > - Starting up pull-agent... > INFO 2015-08-19 07:48:08,088 (Agents thread) - org.apache.manifoldcf.root > - Starting up pull-agent... > INFO 2015-08-19 07:48:08,133 (Agents thread) - org.apache.manifoldcf.root > - Pull-agent started > INFO 2015-08-19 07:48:08,182 (Agents thread) - org.apache.manifoldcf.root > - Pull-agent started > ERROR 2015-08-19 07:48:44,184 (qtp858007949-11) - > org.apache.manifoldcf.misc - Missing resource > 'ForcedMetadata.ForcedMetadataNameMustNotBeNull' in bundle > 'org.apache.manifoldcf.agents.transformation.forcedmetadata.common' for > locale 'en_US' > java.util.MissingResourceException: Can't find resource for bundle > java.util.PropertyResourceBundle, key > ForcedMetadata.ForcedMetadataNameMustNotBeNull > at java.util.ResourceBundle.getObject(ResourceBundle.java:395) > at java.util.ResourceBundle.getString(ResourceBundle.java:355) > at > org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:193) > at > org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:240) > at > org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:208) > at > org.apache.manifoldcf.ui.i18n.ResourceBundleWrapper.getString(ResourceBundleWrapper.java:44) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at ....... > <<<<<< > > if I edit the job definition and remove the regular expression and save > the job, then almost immediately I can see this entries in the log: > > >>>>>> > INFO 2015-08-19 07:52:28,300 (Finisher thread) - > org.apache.manifoldcf.jobs - Marked job 1439951495926 for shutdown > INFO 2015-08-19 07:52:28,434 (Job reset thread) - > org.apache.manifoldcf.jobs - Job 1439951495926 now completed > INFO 2015-08-19 07:52:38,332 (Job notification thread) - > org.apache.manifoldcf.jobs - Found job 1439951495926 in need of > notification > <<<<<< > > Thank you, > > Mike > ------------------------------ > Date: Wed, 19 Aug 2015 03:45:30 -0400 > Subject: Re: Metadata expressions > From: [email protected] > To: [email protected] > > > Hi Mike, > > The documentation (which seems not to have updated on the site yet) says > the following: > > >>>>>> > <p>You can also use regular expressions in the > substitution string, for example: "${there|[0-9]*}", which will extract the > first sequence of sequential numbers it finds in the > value of the field "there", or > "${there|string(.*)|1}", which will include everything following "string" > in the field value. (The third argument specifies the regular > expression group number, with an optional suffix of > "l" or "u" meaning upper-case or lower-case.)</p> > <p>Enter a parameter name, and either select to remove the > value or provide an expression. If you chose to supply an expression, > enter the expression in the box. > <<<<<< > > To evaluate your regular expression with the specific input you gave, I > typically use a regex applet, if you can find a browser that still allows > it: > > http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html > > Dropping your stuff in and clicking the "find()" button yields this: > "Pattern did not match" > > So your regex is not correct. But, "Protocol (\d+)" does match, with the > following group outputs: > > start() = 0, end() = 16 > group(0) = "Protocol 1234500" > group(1) = "1234500" > > So you want group 1. Therefore, the MCF expression would be: > > expression = Protocol-${protocol_name|Protocol (\d+)|1} > > Thanks, > Karl > > > > On Tue, Aug 18, 2015 at 11:19 PM, Mike Caceres <[email protected]> > wrote: > > If I have a document with the following metadata values: > "protocol_name" : "Protocol 1234500 (USPA00012345) second version" > > and I want to produce a new metadata field that looks like this: > > "protocol_id" : "Protocol-1234500" > > should the metadata expression look like this? > > parameter name = protocol_id > remove this parameter = false > expression = Protocol-${protocol_name|string(\d+)|0} > > Thank you! > > >
