Thanks for the ideas and the sanity check! Based on your feedback we've been able to narrow down the problem to something in the custom output connector. Seems we need to join the thread at the end.
On Thu, Mar 8, 2018 at 9:37 AM, Karl Wright <[email protected]> wrote: > As a sanity check, I ran the postgresql RSS connector IT test on trunk and > it passed: > > >>>>>> > run-IT-postgresql: > [junit] Testsuite: org.apache.manifoldcf.crawler.connectors.rss.tests. > RSSSimpleCrawlPostgresqlIT > [junit] Configuration file successfully read > [junit] [main] INFO org.eclipse.jetty.util.log - Logging initialized > @3336ms > [junit] [main] INFO org.eclipse.jetty.server.Server - > jetty-9.2.3.v20140905 > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Started o.e.j.w.WebAppContext@4d1c005e{/mcf-crawler-ui,file:/C:/ > Users/kawright/AppData/Local/Temp/ > jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler- > ui-any-4871569714684839734.dir/webapp/,AVAILABLE}{C:\wip\ > mcf\trunk\dist/web/war/mcf-crawler-ui.war} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Started o.e.j.w.WebAppContext@8462f31{/mcf-authority-service,file:/ > C:/Users/kawright/AppData/Local > /Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf- > authority-service-any-8765187688005999492.dir/ > webapp/,AVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-authority-service > .war} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Started o.e.j.w.WebAppContext@24569dba{/mcf-api-service,file:/C:/ > Users/kawright/AppData/Local/Temp > /jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service- > any-1263632524762735599.dir/webapp/,AVAILABLE}{C:\wip\mcf\ > trunk\dist/web/war/mcf-api-service.war} > [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - > Started ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346} > [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6277ms > [junit] [main] INFO org.eclipse.jetty.server.Server - > jetty-9.2.3.v20140905 > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Started o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,AVAILABLE} > [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - > Started ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189} > [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6290ms > [junit] Crawl required 90542 milliseconds > [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - > Stopped ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Stopped o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,UNAVAILABLE} > [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - > Stopped ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Stopped o.e.j.w.WebAppContext@24569dba{/mcf-api-service,file:/C:/ > Users/kawright/AppData/Local/Temp > /jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service- > any-1263632524762735599.dir/webapp/,UNAVAILABLE}{C:\wip\ > mcf\trunk\dist/web/war/mcf-api-service.war} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Stopped o.e.j.w.WebAppContext@8462f31{/mcf-authority-service,file:/ > C:/Users/kawright/AppData/Local > /Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf- > authority-service-any-8765187688005999492.dir/webapp/,UNAVAILABLE}{C:\wip\ > mcf\trunk\dist/web/war/mcf-authority-servi > ce.war} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Stopped o.e.j.w.WebAppContext@4d1c005e{/mcf-crawler-ui,file:/C:/ > Users/kawright/AppData/Local/Temp/ > jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler- > ui-any-4871569714684839734.dir/webapp/,UNAVAILABLE}{C:\ > wip\mcf\trunk\dist/web/war/mcf-crawler-ui.war} > [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time > elapsed: 126.5 sec > [junit] > [junit] ------------- Standard Error ----------------- > [junit] Configuration file successfully read > [junit] [main] INFO org.eclipse.jetty.util.log - Logging initialized > @3336ms > [junit] [main] INFO org.eclipse.jetty.server.Server - > jetty-9.2.3.v20140905 > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Started o.e.j.w.WebAppContext@4d1c005e{/mcf-crawler-ui,file:/C:/ > Users/kawright/AppData/Local/Temp/ > jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler- > ui-any-4871569714684839734.dir/webapp/,AVAILABLE}{C:\wip\ > mcf\trunk\dist/web/war/mcf-crawler-ui.war} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Started o.e.j.w.WebAppContext@8462f31{/mcf-authority-service,file:/ > C:/Users/kawright/AppData/Local > /Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf- > authority-service-any-8765187688005999492.dir/ > webapp/,AVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-authority-service > .war} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Started o.e.j.w.WebAppContext@24569dba{/mcf-api-service,file:/C:/ > Users/kawright/AppData/Local/Temp > /jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service- > any-1263632524762735599.dir/webapp/,AVAILABLE}{C:\wip\mcf\ > trunk\dist/web/war/mcf-api-service.war} > [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - > Started ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346} > [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6277ms > [junit] [main] INFO org.eclipse.jetty.server.Server - > jetty-9.2.3.v20140905 > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Started o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,AVAILABLE} > [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - > Started ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189} > [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6290ms > [junit] Crawl required 90542 milliseconds > [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - > Stopped ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Stopped o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,UNAVAILABLE} > [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - > Stopped ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Stopped o.e.j.w.WebAppContext@24569dba{/mcf-api-service,file:/C:/ > Users/kawright/AppData/Local/Temp > /jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service- > any-1263632524762735599.dir/webapp/,UNAVAILABLE}{C:\wip\ > mcf\trunk\dist/web/war/mcf-api-service.war} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Stopped o.e.j.w.WebAppContext@8462f31{/mcf-authority-service,file:/ > C:/Users/kawright/AppData/Local > /Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf- > authority-service-any-8765187688005999492.dir/webapp/,UNAVAILABLE}{C:\wip\ > mcf\trunk\dist/web/war/mcf-authority-servi > ce.war} > [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler - > Stopped o.e.j.w.WebAppContext@4d1c005e{/mcf-crawler-ui,file:/C:/ > Users/kawright/AppData/Local/Temp/ > jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler- > ui-any-4871569714684839734.dir/webapp/,UNAVAILABLE}{C:\ > wip\mcf\trunk\dist/web/war/mcf-crawler-ui.war} > [junit] ------------- ---------------- --------------- > > BUILD SUCCESSFUL > Total time: 2 minutes 8 seconds > <<<<<< > > This is running against my installed laptop version of Postgresql on > Windows (version 9.3), with the shipping Postgresql JDBC driver 42.1.3. > The test is a simple crawl against a locally-written RSS service. > > > Karl > > > On Thu, Mar 8, 2018 at 9:54 AM, Karl Wright <[email protected]> wrote: > >> I've reviewed all changes to the RSS connector and to the framework over >> the last year, and none of them could reasonably have been expected to have >> any kind of effect like this. The only things changed were the redirect >> strategy and updating to the latest Postgresql JDBC driver. >> >> If the problem doesn't occur in the single-process example, the next >> question is: do you have a multiprocess setup? If so, try the multiprocess >> example and see if that succeeds. If it does, the problem is how we work >> with Postgresql. >> >> Karl >> >> >> On Thu, Mar 8, 2018 at 9:41 AM, Karl Wright <[email protected]> wrote: >> >>> Hi Mike, >>> >>> You are the third person this morning that has reported this in >>> conjunction with Postgresql. It is possible that some behavior we count on >>> broke in the latest postgresql release. Can you tell me what version you >>> are using? Do you see the same behavior when you run with the built-in >>> HSQLDB example? >>> >>> Karl >>> >>> >>> On Thu, Mar 8, 2018 at 9:32 AM, Mike Hugo <[email protected]> wrote: >>> >>>> Hello, >>>> >>>> I set up a new manifold instance based on the simple example. I >>>> modified properties.xml to point to a postgresql database and then set it >>>> up to read an RSS feed. It uses a custom output connector to send the data >>>> to a custom API. >>>> >>>> I've noticed that it starts properly, but it only pulls in 3 or 4 >>>> records before it "hangs" and doesn't pull in more docs after that. If I >>>> bounce the server then it will pull in 3 or 4 more docs, but then seems to >>>> hang again. >>>> >>>> I can add a new RSS feed and start it, but it won't pull in any >>>> documents until the server is bounced. >>>> >>>> I increased the value of org.apache.manifoldcf.crawler.threads and >>>> that seems to help, but it just delays the same behavior. For example, it >>>> might pull in 10 or 15 docs, but then stops pulling them in again. No >>>> messages in the logs. >>>> >>>> It does appear that it's spawning many many of these threads: >>>> ExecuteQueryThread >>>> >>>> Any ideas where to start looking or how to debug why it hangs after >>>> only a few documents? >>>> >>>> Thanks!! >>>> >>>> Mike >>>> >>> >>> >> >
