Hi Maxence,

The following error:

>>>>>>

FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed:
org/apache/poi/POIXMLTextExtractor

java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor

        at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
~[?:?]

        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[?:?]

        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[?:?]

        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
~[?:?]

        at
org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
~[?:?]

        at
org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
~[?:?]

<<<<<<

.... seems to be the result of putting new POI jars down that are not
compatible fully with the version of Tika that's there.  Unfortunately,
this cannot be addressed right now in any way I can think of.  Tika's
dependencies are legion and they change all the time.

The only thing we can really do is wait for: (1) POI to release their new
software, and then (2) Tika to release a new release that depends on it.

Karl


On Thu, Jul 26, 2018 at 5:33 AM msaunier <[email protected]> wrote:

> Hello Karl,
>
>
>
> For the moment, it working.
>
>
>
> I have write this errors but they are not FATAL:
>
>
>
> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Checking '*'
> against '/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
>
> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Match found.
>
> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Leaving
> checkInclude for
> 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
>
> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Recorded path
> is
> 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
> and is included.
>
> FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed:
> org/apache/poi/POIXMLTextExtractor
>
> java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.contentlimiter.ContentLimiter.addOrReplaceDocumentWithException(ContentLimiter.java:161)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> Caused by: java.lang.ClassNotFoundException:
> org.apache.poi.POIXMLTextExtractor
>
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> ~[?:1.8.0_171]
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> ~[?:1.8.0_171]
>
>         at
> java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814)
> ~[?:1.8.0_171]
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ~[?:1.8.0_171]
>
>         ... 18 more
>
> AND
>
>
>
> Starting crawler...
>
> juil. 26, 2018 11:29:01 AM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
>
> AVERTISSEMENT: JBIG2ImageReader not loaded. jbig2 files will be ignored
>
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>
> for optional dependencies.
>
> TIFFImageWriter not loaded. tiff files will not be processed
>
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>
> for optional dependencies.
>
> J2KImageReader not loaded. JPEG2000 files will not be processed.
>
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>
> for optional dependencies.
>
>
>
> juil. 26, 2018 11:29:01 AM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
>
> AVERTISSEMENT: org.xerial's sqlite-jdbc is not loaded.
>
> Please provide the jar on your classpath to parse sqlite files.
>
> See tika-parsers/pom.xml for the correct version.
>
>
>
> Maxence,
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:[email protected]]
> *Envoyé :* mercredi 25 juillet 2018 19:09
> *À :* [email protected]
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> That's what I was afraid of.  The new poi jars have dependencies we
> haven't accounted for yet.
>
>
>
> Can you download apache-commons-compress jar (latest version should be OK)
> and also put that in connector-common-lib?  Thanks!!
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 1:01 PM msaunier <[email protected]> wrote:
>
> Hi Karl,
>
>
>
> I have add the snapshot and I’m spam with this error :
>
>
>
> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
> org/apache/commons/compress/utils/InputStreamStatistics
>
> java.lang.NoClassDefFoundError:
> org/apache/commons/compress/utils/InputStreamStatistics
>
>         at
> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
> ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>
>         at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
> ~[?:?]
>
>         at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
> ~[?:?]
>
>         at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:[email protected]]
> *Envoyé :* mercredi 25 juillet 2018 13:12
> *À :* [email protected]
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> Tomorrow (7/26) the POI project will be delivering a nightly build which
> should repair the Class Not Found exceptions.  You will need to download it
> here:
>
>
> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>
>
>
> ... and replace all poi jars with the corresponding ones from the binary
> distribution.  I believe the poi jars are all in connector-common-lib.  Be
> sure to delete the old ones (or move them somewhere else) first.
>
>
>
> I don't know whether this will fix your out of memory problem however.
> Please let me know what's still not working and I can take it from there.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <[email protected]> wrote:
>
> Out of memory errors are fatal, I'm afraid, because they corrupt not only
> the document in question but all others being processed at the same time.
> So those cannot be ignored.
>
>
>
> Tika should ignore documents that it cannot process, however, and that is
> a great enhancement request for them.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Jul 25, 2018 at 3:39 AM msaunier <[email protected]> wrote:
>
> Hi Karl,
>
>
>
> Okay. So today, I'm going to force ManifoldCF to run so that only the
> documents are left behind.
>
> In the future, could I ignore these mistakes? Because it makes the
> application crash, and in production it is not terrible as behavior.
>
>
>
> Thanks
>
> Maxence,
>
>
>
>
>
> *De :* Karl Wright [mailto:[email protected]]
> *Envoyé :* mardi 24 juillet 2018 17:53
> *À :* [email protected]
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> The problem isn't with images in general; it's with certain kinds of
> images.  There are optional dependencies in Tika for some kinds of images
> that we cannot include in the MCF distribution because of licensing
> problems.  I don't know which kinds these are but apparently you are trying
> to index some of them.
>
> You will need to find and download the right jar and put it in the
> connector-common-lib folder for this to work.
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 11:36 AM msaunier <[email protected]> wrote:
>
> On other crawl I extract images with sames parameters and I not have
> problems with images. They are index without errors. Images are necessary
> for this job. I try to recreate my job and test.
>
>
>
> Thanks,
>
> Maxence,
>
>
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:[email protected]]
> *Envoyé :* mardi 24 juillet 2018 17:32
> *À :* [email protected]
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> " java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)"
>
>
>
> This exception is occurring because you are trying to extract content from
> an image.  In order for this to work you need a jar that isn't supplied
> with Tika for licensing reasons.  Can you exclude images from your crawl?
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 10:32 AM msaunier <[email protected]> wrote:
>
> Hi Karl,
>
>
>
> With just connectors in debug I have that informations:
>
>
>
> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0xff00000201970049, negotiated timeout = 40000
>
> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
> live nodes from ZooKeeper... (0) -> (2)
>
> [Thread-269948] INFO
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
> kemp-formation-solr:2181 ready
>
> java.lang.NoSuchMethodException:
> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
> boolean)
>
>         at java.lang.Class.getConstructor0(Class.java:3082)
>
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>
>         at
> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>
>         at
> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>
>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>
>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>
>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>
>         at
> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>
>         at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>
>         at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>
>         at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
> connection and attempting reconnect
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-16-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>
>         at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>
>         at
> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>
>         at
> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>
>         at
> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>
>         at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>
> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
> 0x100000050ae004d, negotiated timeout = 40000
>
> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.HashMap.resize(HashMap.java:704)
>
>         at java.util.HashMap.putVal(HashMap.java:629)
>
>         at java.util.HashMap.put(HashMap.java:612)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>
> agents process ran out of memory - shutting down
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.util.Arrays.copyOf(Arrays.java:3308)
>
>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>
>         at java.util.BitSet.expandTo(BitSet.java:352)
>
>         at java.util.BitSet.set(BitSet.java:447)
>
>         at
> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>
>         at
> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
>         at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
>         at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
>         at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
>         at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
>         at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>
>         at
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
>         at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>
>         at
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004e closed
>
> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004e
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004d closed
>
> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004d
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004a closed
>
> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004a
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004b closed
>
> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004b
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970046 closed
>
> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970046
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x100000050ae004c closed
>
> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae004c
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>
> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0x2000000b80d004c closed
>
> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x2000000b80d004c
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970048 closed
>
> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970048
>
> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
> 0xff00000201970049 closed
>
> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970049
>
>
>
> I have unactivate history to gain performances. So, can I find the last
> file with SQL request?
>
>
>
> Maxence,
>
>
>
> *De :* Karl Wright [mailto:[email protected]]
> *Envoyé :* mardi 24 juillet 2018 16:04
> *À :* [email protected]
> *Objet :* Re: Out of memory, one file bug i think
>
>
>
> Hi Maxence,
>
>
>
> You would want to turn on connector debugging INSTEAD of the debugging
> you've turned on, which is very noisy and not helpful.
>
>
>
> In global properties: org.apache.manifoldcf.connectors value DEBUG
>
>
>
> Karl
>
>
>
>
>
> On Tue, Jul 24, 2018 at 9:12 AM msaunier <[email protected]> wrote:
>
> With debug:
>
>
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27737ms for sessionid 0xff00000201970043, closing socket
> connection and attempting reconnect
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047
>
> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
> connection and attempting reconnect
>
> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27708ms for sessionid 0xff00000201970044, closing socket
> connection and attempting reconnect
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046
>
> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>
>         at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>
>         at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>
>         at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>
>         at
> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>
>         at
> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>
>         at
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>
>         at
> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
> authenticate using SASL (unknown error)
>
> agents process ran out of memory - shutting down
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a
>
> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
> connection and attempting reconnect
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-3-thread-7] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
> connection and attempting reconnect
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established to
> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Disconnected type:None path:null path: null type: None
>
> [zkCallback-11-thread-5] WARN
> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired
>
> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0xff00000201970043 has expired, closing socket connection
>
> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0xff00000201970043
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired
>
> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
> session 0x100000050ae0049 has expired, closing socket connection
>
> [zkCallback-11-thread-2] WARN
> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
> - starting a new one...
>
> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
> client connection, connectString=kemp-formation-solr:2181
> sessionTimeout=60000
> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>
> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
> EventThread shut down for session: 0x100000050ae0049
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Watcher
> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
> state:Expired type:None path:null path: null type: None
>
> [zkCallback-3-thread-4] WARN
> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
> session was expired. Attempting to r
>
>

Reply via email to