Did this work for you?
Karl

On Thu, Jan 11, 2018 at 6:36 AM, Karl Wright <[email protected]> wrote:

> If you need the jcifs connector, run "ant make-deps" too.  Then run "ant
> build" again.
>
> Karl
>
> On Thu, Jan 11, 2018 at 4:30 AM, msaunier <[email protected]> wrote:
>
>> Hello Karl,
>>
>>
>>
>> I have build and configured but WindowsShare connector do not appear in
>> the list of repository connectors.
>>
>>
>>
>> ·        I have add jcifs.jar into the connectors/jcifs/lib-proprietary
>> directory
>>
>> ·        I have ant make-core-deps
>>
>> ·        Ant build
>>
>> ·        Uncomment windows share into the connectors-proprietary.xml
>> file in the dist folder
>>
>> ·        I have add jcifs.jar in connector-lib-proprietary
>>
>>
>>
>> But not have the proposition on the manifold interface.
>>
>>
>>
>> Any idea ?
>>
>> Thanks.
>>
>>
>>
>>
>>
>> *De :* msaunier [mailto:[email protected]]
>> *Envoyé :* mercredi 10 janvier 2018 18:15
>> *À :* [email protected]
>> *Objet :* RE: Document connector excluding mime type and size - Tika
>> Parser error
>>
>>
>>
>> Good !
>>
>>
>>
>> I configure and test that.
>>
>> I give you a return as soon as the reading is finished.
>>
>> 400k documents.
>>
>>
>>
>> If it works, I test on few million of documents.
>>
>>
>>
>> Thank.
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:[email protected] <[email protected]>]
>> *Envoyé :* mercredi 10 janvier 2018 17:45
>> *À :* [email protected]
>> *Objet :* Re: Document connector excluding mime type and size - Tika
>> Parser error
>>
>>
>>
>> The build you should be using is the ant build.  Do not use the maven
>> build for this purpose.
>>
>>
>>
>> - Check out trunk:
>>
>>
>>
>> svn co https://svn.apache.org/repos/asf/manifoldcf/trunk
>>
>>
>>
>> - Download dependencies:
>>
>>
>>
>> ant make-core-deps
>>
>>
>>
>> - Build:
>>
>>
>>
>> ant build
>>
>>
>>
>> - Your deliverable is in the "dist" directory
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jan 10, 2018 at 11:37 AM, msaunier <[email protected]> wrote:
>>
>> I have an error with the maven build, so I have test with an external
>> 1.17 Tika Server but, POI not included. If you success a mvn package with
>> 1.17 Tika, I am interested.
>>
>>
>>
>> Today, I have not had much time to deal with it.
>>
>>
>>
>> I found some bugs that I would declare tomorrow if they are not already.
>> They concern log4j2, local_fr and a bug with the web interface and the
>> keyboard input key.
>>
>>
>>
>> I continu my investigation.
>>
>>
>>
>> *De :* Karl Wright [mailto:[email protected]]
>> *Envoyé :* mercredi 10 janvier 2018 17:15
>>
>>
>> *À :* [email protected]
>> *Objet :* Re: Document connector excluding mime type and size - Tika
>> Parser error
>>
>>
>>
>> Any news?
>>
>> Karl
>>
>>
>>
>> On Tue, Jan 9, 2018 at 1:10 PM, Karl Wright <[email protected]> wrote:
>>
>> Let me know what happens.
>> If it works for you, I'll see if we can put together a patch release of
>> 2.9 with the fix.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jan 9, 2018 at 11:07 AM, msaunier <[email protected]> wrote:
>>
>> Test check out and building with POI 3.17 and Tika 1.17?
>>
>>
>>
>> It’s possible.
>>
>>
>>
>> I finish a project and I test that.
>>
>>
>>
>> *De :* Karl Wright [mailto:[email protected]]
>> *Envoyé :* mardi 9 janvier 2018 16:57
>>
>>
>> *À :* [email protected]
>> *Objet :* Re: Document connector excluding mime type and size - Tika
>> Parser error
>>
>>
>>
>> So here's the problem; we used POI 3.17 with Tika 3.16 in 2.9, in order
>> to deal with the classloader issue present in POI 3.15, and because POI
>> 3.16 has a severe security issue that made it impossible to ship with.
>>
>>
>>
>> Unfortunately that doesn't quite work; POI 3.17 is not backwards
>> compatible with 3.16 completely and therefore problems occur with this
>> combination.
>>
>>
>>
>> The probable solution is to check out and build trunk and see if that
>> works for you.  It very well might.  The question then is what to do next,
>> because we are not scheduled to release again until April.  We might have
>> to do a point release to deal with this.
>>
>>
>>
>> Please give it a try and let me know what happens.
>>
>>
>>
>> Thanks,
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jan 9, 2018 at 10:29 AM, Karl Wright <[email protected]> wrote:
>>
>> Ok, never mind that last email.  We patched it in part in 2.9 by
>> including the latest POI.  So clearly it's still an existing problem in
>> POI.  I'll have to open a ticket there and await a patch from them.
>>
>>
>>
>> Karl
>>
>>
>>
>> On Tue, Jan 9, 2018 at 10:27 AM, Karl Wright <[email protected]> wrote:
>>
>> This screenshot cannot be MCF 2.9 since the version of poi was not 3.17
>> for the 2.9 release.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jan 9, 2018 at 10:02 AM, msaunier <[email protected]> wrote:
>>
>> They 2 versions (2.8.1 and 2.9) of ManifoldCF are on 2 differents servers.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:[email protected]]
>> *Envoyé :* mardi 9 janvier 2018 15:54
>>
>>
>> *À :* [email protected]
>> *Objet :* Re: Document connector excluding mime type and size - Tika
>> Parser error
>>
>>
>>
>> As for the Tika issue, we explicitly tested documents of that type when
>> rolling out 2.8.1.  When we updated 2.8.1 to a new Tika in 2.9 I believe we
>> also tested this.
>>
>>
>>
>> One of the potential issues is that if you are dropping down different
>> versions of ManifoldCF into the same directories you *might* have a poi*
>> jar in the wrong place because of the way we had to do the patch.  Please
>> have a look at where the poi* jars are in your directory structure; they
>> should all be in one directory (connector-common-lib).  If you see any
>> anywhere else, that's the cause of the issue.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jan 9, 2018 at 9:43 AM, Karl Wright <[email protected]> wrote:
>>
>> Since the Tika extractor essentially filters out the content mime type
>> (other than presenting it as metadata), you need to put an "allowed
>> documents" transformation connection into your job pipeline BEFORE the Tika
>> connector:
>>
>>
>>
>> https://manifoldcf.apache.org/release/release-2.9/en_US/end-
>> user-documentation.html#alloweddocuments
>>
>>
>>
>> In fact, mime type exclusion is actually disabled in the Solr output
>> connector *unless* you are using the extracting update handler.  That
>> should resolve the one problem for you.
>>
>>
>>
>> Thanks,
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jan 9, 2018 at 9:35 AM, msaunier <[email protected]> wrote:
>>
>> They document for Tika are :
>>
>> ·        Microsoft Word 97-2003
>>
>> ·        Application/msword
>>
>>
>>
>> I can’t have more informations, they are in SCO servers and SCO do not
>> have ls –lisan or stat command.
>>
>>
>>
>> For SolR connecting, I seem to have emptied the index before the last
>> indexation. (ManifoldCF and Solr) I do it again to be sure.
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:[email protected]]
>> *Envoyé :* mardi 9 janvier 2018 15:26
>>
>>
>> *À :* [email protected]
>> *Objet :* Re: Document connector excluding mime type and size - Tika
>> Parser error
>>
>>
>>
>> CONNECTORS-1482 is for the Solr connector filtering issue.  A question:
>> When you changed these fields in the output connection, had you already
>> indexed any documents?  Those would only get cleaned up if you did a
>> subsequent full crawl, after you made the connection change.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jan 9, 2018 at 9:22 AM, Karl Wright <[email protected]> wrote:
>>
>> If you let me know what kind of file they are (extension and what
>> application created them) that is probably good enough.
>>
>> Karl
>>
>>
>>
>> On Tue, Jan 9, 2018 at 9:19 AM, msaunier <[email protected]> wrote:
>>
>> Okay good. I look if I can test 1.17 Tika version.
>>
>>
>>
>> I can’t transfert a document with this error, they are privates. Sorry.
>>
>>
>>
>> If I encounter the error again on a non-private document, I'll come back
>> to you.
>>
>>
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:[email protected]]
>> *Envoyé :* mardi 9 janvier 2018 15:12
>>
>>
>> *À :* [email protected]
>> *Objet :* Re: Document connector excluding mime type and size - Tika
>> Parser error
>>
>>
>>
>> CONNECTORS-1481 is the ticket for the Tika problem.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jan 9, 2018 at 8:34 AM, Karl Wright <[email protected]> wrote:
>>
>> Ok, if you are in a position to build trunk, that's a newer version of
>> Tika (1.17) which might (or might not) address this problem.
>>
>>
>>
>> If you could create a ticket, I'd greatly appreciate attaching one
>> document to it that causes the failure.
>>
>>
>>
>> Thanks!
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jan 9, 2018 at 8:02 AM, msaunier <[email protected]> wrote:
>>
>> It’s a 2.9 version.
>>
>>
>>
>> I have a 2.8.1 in an other server with same job and same documents. I
>> will test on this other server and make you a return.
>>
>>
>>
>> Thanks for your help.
>>
>>
>>
>> *De :* Karl Wright [mailto:[email protected]]
>> *Envoyé :* mardi 9 janvier 2018 13:15
>> *À :* [email protected]
>> *Objet :* Re: Document connector excluding mime type and size - Tika
>> Parser error
>>
>>
>>
>> I looked at the history of this.  We had to release a patch (2.8.1) that
>> put various poi jars at root level in order to work around a Tika problem.
>> That patch may not have been entirely correct in that it looks like it may
>> have blocked access by one of the deeper jars to a higher level.
>>
>>
>>
>> Release 2.9 should fix this if I am correct.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jan 9, 2018 at 6:39 AM, Karl Wright <[email protected]> wrote:
>>
>> What version of MCF is this?  That's important to know since Tika has had
>> problems with this kind of thing in the past and this looks like something
>> similar.
>>
>>
>>
>> The problem you are reporting is due to either a missing jar, or a bug in
>> an internal tika classloader.  But I need to know whether this is a current
>> bug or not, since we just went to a new Tika version.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jan 9, 2018 at 4:32 AM, msaunier <[email protected]> wrote:
>>
>> Hello Karl,
>>
>> I hope you are well today.
>>
>>
>>
>> I have 2 problems with ManifoldCF.
>>
>>
>>
>> -----------
>>
>> In **Outputs connectors** with Solr connector. I have add a « Maximum
>> document length and I have « Excluded 5 mime types » but it not work. I
>> join capture.
>>
>>
>>
>> ----------
>>
>> And in second, I have a **Tika exception** in ManifoldCF. 3 documents
>> are blocked :
>>
>>
>>
>> FATAL 2018-01-09T10:19:54,992 (Worker thread '5') - Error tossed:
>> org.apache.poi.hwmf.record.HwmfFont.getCharSet()Lorg/apache/
>> poi/hwmf/record/HwmfFont$WmfCharset;
>>
>> java.lang.NoSuchMethodError: org.apache.poi.hwmf.record.Hwm
>> fFont.getCharSet()Lorg/apache/poi/hwmf/record/HwmfFont$WmfCharset;
>>
>>         at 
>> org.apache.tika.parser.microsoft.WMFParser.parse(WMFParser.java:74)
>> ~[?:?]
>>
>>         at 
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[?:?]
>>
>>         at 
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[?:?]
>>
>>         at 
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
>> ~[?:?]
>>
>>         at 
>> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
>> ~[?:?]
>>
>>         at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.
>> parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) ~[?:?]
>>
>>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtracto
>> r.handleEmbeddedFile(AbstractOOXMLExtractor.java:375) ~[?:?]
>>
>>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtracto
>> r.handleEmbeddedPart(AbstractOOXMLExtractor.java:260) ~[?:?]
>>
>>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtracto
>> r.handleEmbeddedParts(AbstractOOXMLExtractor.java:205) ~[?:?]
>>
>>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtracto
>> r.getXHTML(AbstractOOXMLExtractor.java:142) ~[?:?]
>>
>>         at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory
>> .parse(OOXMLExtractorFactory.java:142) ~[?:?]
>>
>>         at 
>> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>> ~[?:?]
>>
>>         at 
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[?:?]
>>
>>         at 
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[?:?]
>>
>>         at 
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
>> ~[?:?]
>>
>>         at 
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>> ~[?:?]
>>
>>         at org.apache.manifoldcf.agents.transformation.tika.TikaExtract
>> or.addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]
>>
>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIn
>> gester$PipelineAddEntryPoint.addOrReplaceDocumentWithExcepti
>> on(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]
>>
>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIn
>> gester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>> ~[mcf-agents.jar:?]
>>
>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIn
>> gester$PipelineObjectWithVersions.addOrReplaceDocumentWithEx
>> ception(IncrementalIngester.java:2708) ~[mcf-agents.jar:?]
>>
>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIn
>> gester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?]
>>
>>         at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessAct
>> ivity.ingestDocumentWithException(WorkerThread.java:1583)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessAct
>> ivity.ingestDocumentWithException(WorkerThread.java:1548)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDr
>> iveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?]
>>
>>         at 
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>> [mcf-pull-agent.jar:?]
>>
>>
>>
>> I need to create an incident ticket?
>>
>>
>>
>> ----------
>>
>>
>>
>> Thanks for your help.
>>
>>
>>
>> Cordialement,
>>
>>
>>
>> [image: msaunier]
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Reply via email to