The build you should be using is the ant build. Do not use the maven build for this purpose.
- Check out trunk: svn co https://svn.apache.org/repos/asf/manifoldcf/trunk - Download dependencies: ant make-core-deps - Build: ant build - Your deliverable is in the "dist" directory Karl On Wed, Jan 10, 2018 at 11:37 AM, msaunier <[email protected]> wrote: > I have an error with the maven build, so I have test with an external 1.17 > Tika Server but, POI not included. If you success a mvn package with 1.17 > Tika, I am interested. > > > > Today, I have not had much time to deal with it. > > > > I found some bugs that I would declare tomorrow if they are not already. > They concern log4j2, local_fr and a bug with the web interface and the > keyboard input key. > > > > I continu my investigation. > > > > *De :* Karl Wright [mailto:[email protected]] > *Envoyé :* mercredi 10 janvier 2018 17:15 > > *À :* [email protected] > *Objet :* Re: Document connector excluding mime type and size - Tika > Parser error > > > > Any news? > > Karl > > > > On Tue, Jan 9, 2018 at 1:10 PM, Karl Wright <[email protected]> wrote: > > Let me know what happens. > If it works for you, I'll see if we can put together a patch release of > 2.9 with the fix. > > > > Karl > > > > > > On Tue, Jan 9, 2018 at 11:07 AM, msaunier <[email protected]> wrote: > > Test check out and building with POI 3.17 and Tika 1.17? > > > > It’s possible. > > > > I finish a project and I test that. > > > > *De :* Karl Wright [mailto:[email protected]] > *Envoyé :* mardi 9 janvier 2018 16:57 > > > *À :* [email protected] > *Objet :* Re: Document connector excluding mime type and size - Tika > Parser error > > > > So here's the problem; we used POI 3.17 with Tika 3.16 in 2.9, in order to > deal with the classloader issue present in POI 3.15, and because POI 3.16 > has a severe security issue that made it impossible to ship with. > > > > Unfortunately that doesn't quite work; POI 3.17 is not backwards > compatible with 3.16 completely and therefore problems occur with this > combination. > > > > The probable solution is to check out and build trunk and see if that > works for you. It very well might. The question then is what to do next, > because we are not scheduled to release again until April. We might have > to do a point release to deal with this. > > > > Please give it a try and let me know what happens. > > > > Thanks, > > Karl > > > > > > On Tue, Jan 9, 2018 at 10:29 AM, Karl Wright <[email protected]> wrote: > > Ok, never mind that last email. We patched it in part in 2.9 by including > the latest POI. So clearly it's still an existing problem in POI. I'll > have to open a ticket there and await a patch from them. > > > > Karl > > > > On Tue, Jan 9, 2018 at 10:27 AM, Karl Wright <[email protected]> wrote: > > This screenshot cannot be MCF 2.9 since the version of poi was not 3.17 > for the 2.9 release. > > > > Karl > > > > > > On Tue, Jan 9, 2018 at 10:02 AM, msaunier <[email protected]> wrote: > > They 2 versions (2.8.1 and 2.9) of ManifoldCF are on 2 differents servers. > > > > > > > > > > *De :* Karl Wright [mailto:[email protected]] > *Envoyé :* mardi 9 janvier 2018 15:54 > > > *À :* [email protected] > *Objet :* Re: Document connector excluding mime type and size - Tika > Parser error > > > > As for the Tika issue, we explicitly tested documents of that type when > rolling out 2.8.1. When we updated 2.8.1 to a new Tika in 2.9 I believe we > also tested this. > > > > One of the potential issues is that if you are dropping down different > versions of ManifoldCF into the same directories you *might* have a poi* > jar in the wrong place because of the way we had to do the patch. Please > have a look at where the poi* jars are in your directory structure; they > should all be in one directory (connector-common-lib). If you see any > anywhere else, that's the cause of the issue. > > > > Karl > > > > > > On Tue, Jan 9, 2018 at 9:43 AM, Karl Wright <[email protected]> wrote: > > Since the Tika extractor essentially filters out the content mime type > (other than presenting it as metadata), you need to put an "allowed > documents" transformation connection into your job pipeline BEFORE the Tika > connector: > > > > https://manifoldcf.apache.org/release/release-2.9/en_US/end- > user-documentation.html#alloweddocuments > > > > In fact, mime type exclusion is actually disabled in the Solr output > connector *unless* you are using the extracting update handler. That > should resolve the one problem for you. > > > > Thanks, > > Karl > > > > > > On Tue, Jan 9, 2018 at 9:35 AM, msaunier <[email protected]> wrote: > > They document for Tika are : > > · Microsoft Word 97-2003 > > · Application/msword > > > > I can’t have more informations, they are in SCO servers and SCO do not > have ls –lisan or stat command. > > > > For SolR connecting, I seem to have emptied the index before the last > indexation. (ManifoldCF and Solr) I do it again to be sure. > > > > > > *De :* Karl Wright [mailto:[email protected]] > *Envoyé :* mardi 9 janvier 2018 15:26 > > > *À :* [email protected] > *Objet :* Re: Document connector excluding mime type and size - Tika > Parser error > > > > CONNECTORS-1482 is for the Solr connector filtering issue. A question: > When you changed these fields in the output connection, had you already > indexed any documents? Those would only get cleaned up if you did a > subsequent full crawl, after you made the connection change. > > > > Karl > > > > > > > > On Tue, Jan 9, 2018 at 9:22 AM, Karl Wright <[email protected]> wrote: > > If you let me know what kind of file they are (extension and what > application created them) that is probably good enough. > > Karl > > > > On Tue, Jan 9, 2018 at 9:19 AM, msaunier <[email protected]> wrote: > > Okay good. I look if I can test 1.17 Tika version. > > > > I can’t transfert a document with this error, they are privates. Sorry. > > > > If I encounter the error again on a non-private document, I'll come back > to you. > > > > > > > > *De :* Karl Wright [mailto:[email protected]] > *Envoyé :* mardi 9 janvier 2018 15:12 > > > *À :* [email protected] > *Objet :* Re: Document connector excluding mime type and size - Tika > Parser error > > > > CONNECTORS-1481 is the ticket for the Tika problem. > > > > Karl > > > > > > On Tue, Jan 9, 2018 at 8:34 AM, Karl Wright <[email protected]> wrote: > > Ok, if you are in a position to build trunk, that's a newer version of > Tika (1.17) which might (or might not) address this problem. > > > > If you could create a ticket, I'd greatly appreciate attaching one > document to it that causes the failure. > > > > Thanks! > > Karl > > > > > > On Tue, Jan 9, 2018 at 8:02 AM, msaunier <[email protected]> wrote: > > It’s a 2.9 version. > > > > I have a 2.8.1 in an other server with same job and same documents. I will > test on this other server and make you a return. > > > > Thanks for your help. > > > > *De :* Karl Wright [mailto:[email protected]] > *Envoyé :* mardi 9 janvier 2018 13:15 > *À :* [email protected] > *Objet :* Re: Document connector excluding mime type and size - Tika > Parser error > > > > I looked at the history of this. We had to release a patch (2.8.1) that > put various poi jars at root level in order to work around a Tika problem. > That patch may not have been entirely correct in that it looks like it may > have blocked access by one of the deeper jars to a higher level. > > > > Release 2.9 should fix this if I am correct. > > > > Karl > > > > > > On Tue, Jan 9, 2018 at 6:39 AM, Karl Wright <[email protected]> wrote: > > What version of MCF is this? That's important to know since Tika has had > problems with this kind of thing in the past and this looks like something > similar. > > > > The problem you are reporting is due to either a missing jar, or a bug in > an internal tika classloader. But I need to know whether this is a current > bug or not, since we just went to a new Tika version. > > > > Karl > > > > > > On Tue, Jan 9, 2018 at 4:32 AM, msaunier <[email protected]> wrote: > > Hello Karl, > > I hope you are well today. > > > > I have 2 problems with ManifoldCF. > > > > ----------- > > In **Outputs connectors** with Solr connector. I have add a « Maximum > document length and I have « Excluded 5 mime types » but it not work. I > join capture. > > > > ---------- > > And in second, I have a **Tika exception** in ManifoldCF. 3 documents are > blocked : > > > > FATAL 2018-01-09T10:19:54,992 (Worker thread '5') - Error tossed: > org.apache.poi.hwmf.record.HwmfFont.getCharSet()Lorg/ > apache/poi/hwmf/record/HwmfFont$WmfCharset; > > java.lang.NoSuchMethodError: org.apache.poi.hwmf.record. > HwmfFont.getCharSet()Lorg/apache/poi/hwmf/record/HwmfFont$WmfCharset; > > at org.apache.tika.parser.microsoft.WMFParser.parse(WMFParser.java:74) > ~[?:?] > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ~[?:?] > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ~[?:?] > > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) > ~[?:?] > > at > org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) > ~[?:?] > > at org.apache.tika.extractor.ParsingEmbeddedDocumentExtract > or.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) ~[?:?] > > at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor. > handleEmbeddedFile(AbstractOOXMLExtractor.java:375) ~[?:?] > > at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor. > handleEmbeddedPart(AbstractOOXMLExtractor.java:260) ~[?:?] > > at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor. > handleEmbeddedParts(AbstractOOXMLExtractor.java:205) ~[?:?] > > at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor. > getXHTML(AbstractOOXMLExtractor.java:142) ~[?:?] > > at org.apache.tika.parser.microsoft.ooxml. > OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:142) ~[?:?] > > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) > ~[?:?] > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ~[?:?] > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ~[?:?] > > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) > ~[?:?] > > at org.apache.manifoldcf.agents.transformation.tika. > TikaParser.parse(TikaParser.java:74) ~[?:?] > > at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor. > addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?] > > at org.apache.manifoldcf.agents.incrementalingest. > IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithExcept > ion(IncrementalIngester.java:3226) ~[mcf-agents.jar:?] > > at org.apache.manifoldcf.agents.incrementalingest. > IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) > ~[mcf-agents.jar:?] > > at org.apache.manifoldcf.agents.incrementalingest. > IncrementalIngester$PipelineObjectWithVersions. > addOrReplaceDocumentWithException(IncrementalIngester.java:2708) > ~[mcf-agents.jar:?] > > at org.apache.manifoldcf.agents.incrementalingest. > IncrementalIngester.documentIngest(IncrementalIngester.java:756) > ~[mcf-agents.jar:?] > > at org.apache.manifoldcf.crawler.system.WorkerThread$ > ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) > ~[mcf-pull-agent.jar:?] > > at org.apache.manifoldcf.crawler.system.WorkerThread$ > ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) > ~[mcf-pull-agent.jar:?] > > at org.apache.manifoldcf.crawler.connectors.sharedrive. > SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) > ~[?:?] > > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > [mcf-pull-agent.jar:?] > > > > I need to create an incident ticket? > > > > ---------- > > > > Thanks for your help. > > > > Cordialement, > > > > [image: msaunier] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
