This screenshot cannot be MCF 2.9 since the version of poi was not 3.17 for
the 2.9 release.

Karl


On Tue, Jan 9, 2018 at 10:02 AM, msaunier <[email protected]> wrote:

> They 2 versions (2.8.1 and 2.9) of ManifoldCF are on 2 differents servers.
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:[email protected]]
> *Envoyé :* mardi 9 janvier 2018 15:54
>
> *À :* [email protected]
> *Objet :* Re: Document connector excluding mime type and size - Tika
> Parser error
>
>
>
> As for the Tika issue, we explicitly tested documents of that type when
> rolling out 2.8.1.  When we updated 2.8.1 to a new Tika in 2.9 I believe we
> also tested this.
>
>
>
> One of the potential issues is that if you are dropping down different
> versions of ManifoldCF into the same directories you *might* have a poi*
> jar in the wrong place because of the way we had to do the patch.  Please
> have a look at where the poi* jars are in your directory structure; they
> should all be in one directory (connector-common-lib).  If you see any
> anywhere else, that's the cause of the issue.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jan 9, 2018 at 9:43 AM, Karl Wright <[email protected]> wrote:
>
> Since the Tika extractor essentially filters out the content mime type
> (other than presenting it as metadata), you need to put an "allowed
> documents" transformation connection into your job pipeline BEFORE the Tika
> connector:
>
>
>
> https://manifoldcf.apache.org/release/release-2.9/en_US/end-
> user-documentation.html#alloweddocuments
>
>
>
> In fact, mime type exclusion is actually disabled in the Solr output
> connector *unless* you are using the extracting update handler.  That
> should resolve the one problem for you.
>
>
>
> Thanks,
>
> Karl
>
>
>
>
>
> On Tue, Jan 9, 2018 at 9:35 AM, msaunier <[email protected]> wrote:
>
> They document for Tika are :
>
> ·        Microsoft Word 97-2003
>
> ·        Application/msword
>
>
>
> I can’t have more informations, they are in SCO servers and SCO do not
> have ls –lisan or stat command.
>
>
>
> For SolR connecting, I seem to have emptied the index before the last
> indexation. (ManifoldCF and Solr) I do it again to be sure.
>
>
>
>
>
> *De :* Karl Wright [mailto:[email protected]]
> *Envoyé :* mardi 9 janvier 2018 15:26
>
>
> *À :* [email protected]
> *Objet :* Re: Document connector excluding mime type and size - Tika
> Parser error
>
>
>
> CONNECTORS-1482 is for the Solr connector filtering issue.  A question:
> When you changed these fields in the output connection, had you already
> indexed any documents?  Those would only get cleaned up if you did a
> subsequent full crawl, after you made the connection change.
>
>
>
> Karl
>
>
>
>
>
>
>
> On Tue, Jan 9, 2018 at 9:22 AM, Karl Wright <[email protected]> wrote:
>
> If you let me know what kind of file they are (extension and what
> application created them) that is probably good enough.
>
> Karl
>
>
>
> On Tue, Jan 9, 2018 at 9:19 AM, msaunier <[email protected]> wrote:
>
> Okay good. I look if I can test 1.17 Tika version.
>
>
>
> I can’t transfert a document with this error, they are privates. Sorry.
>
>
>
> If I encounter the error again on a non-private document, I'll come back
> to you.
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:[email protected]]
> *Envoyé :* mardi 9 janvier 2018 15:12
>
>
> *À :* [email protected]
> *Objet :* Re: Document connector excluding mime type and size - Tika
> Parser error
>
>
>
> CONNECTORS-1481 is the ticket for the Tika problem.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jan 9, 2018 at 8:34 AM, Karl Wright <[email protected]> wrote:
>
> Ok, if you are in a position to build trunk, that's a newer version of
> Tika (1.17) which might (or might not) address this problem.
>
>
>
> If you could create a ticket, I'd greatly appreciate attaching one
> document to it that causes the failure.
>
>
>
> Thanks!
>
> Karl
>
>
>
>
>
> On Tue, Jan 9, 2018 at 8:02 AM, msaunier <[email protected]> wrote:
>
> It’s a 2.9 version.
>
>
>
> I have a 2.8.1 in an other server with same job and same documents. I will
> test on this other server and make you a return.
>
>
>
> Thanks for your help.
>
>
>
> *De :* Karl Wright [mailto:[email protected]]
> *Envoyé :* mardi 9 janvier 2018 13:15
> *À :* [email protected]
> *Objet :* Re: Document connector excluding mime type and size - Tika
> Parser error
>
>
>
> I looked at the history of this.  We had to release a patch (2.8.1) that
> put various poi jars at root level in order to work around a Tika problem.
> That patch may not have been entirely correct in that it looks like it may
> have blocked access by one of the deeper jars to a higher level.
>
>
>
> Release 2.9 should fix this if I am correct.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jan 9, 2018 at 6:39 AM, Karl Wright <[email protected]> wrote:
>
> What version of MCF is this?  That's important to know since Tika has had
> problems with this kind of thing in the past and this looks like something
> similar.
>
>
>
> The problem you are reporting is due to either a missing jar, or a bug in
> an internal tika classloader.  But I need to know whether this is a current
> bug or not, since we just went to a new Tika version.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jan 9, 2018 at 4:32 AM, msaunier <[email protected]> wrote:
>
> Hello Karl,
>
> I hope you are well today.
>
>
>
> I have 2 problems with ManifoldCF.
>
>
>
> -----------
>
> In **Outputs connectors** with Solr connector. I have add a « Maximum
> document length and I have « Excluded 5 mime types » but it not work. I
> join capture.
>
>
>
> ----------
>
> And in second, I have a **Tika exception** in ManifoldCF. 3 documents are
> blocked :
>
>
>
> FATAL 2018-01-09T10:19:54,992 (Worker thread '5') - Error tossed:
> org.apache.poi.hwmf.record.HwmfFont.getCharSet()Lorg/
> apache/poi/hwmf/record/HwmfFont$WmfCharset;
>
> java.lang.NoSuchMethodError: org.apache.poi.hwmf.record.
> HwmfFont.getCharSet()Lorg/apache/poi/hwmf/record/HwmfFont$WmfCharset;
>
>         at org.apache.tika.parser.microsoft.WMFParser.parse(WMFParser.java:74)
> ~[?:?]
>
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> ~[?:?]
>
>         at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> ~[?:?]
>
>         at org.apache.tika.extractor.ParsingEmbeddedDocumentExtract
> or.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.
> handleEmbeddedFile(AbstractOOXMLExtractor.java:375) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.
> handleEmbeddedPart(AbstractOOXMLExtractor.java:260) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.
> handleEmbeddedParts(AbstractOOXMLExtractor.java:205) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.
> getXHTML(AbstractOOXMLExtractor.java:142) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.
> OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:142) ~[?:?]
>
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
> ~[?:?]
>
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> ~[?:?]
>
>         at org.apache.manifoldcf.agents.transformation.tika.
> TikaParser.parse(TikaParser.java:74) ~[?:?]
>
>         at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.
> addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]
>
>         at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithExcept
> ion(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]
>
>         at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineObjectWithVersions.
> addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at org.apache.manifoldcf.crawler.system.WorkerThread$
> ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at org.apache.manifoldcf.crawler.system.WorkerThread$
> ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at org.apache.manifoldcf.crawler.connectors.sharedrive.
> SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
>
> I need to create an incident ticket?
>
>
>
> ----------
>
>
>
> Thanks for your help.
>
>
>
> Cordialement,
>
>
>
> [image: msaunier]
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to