Great, thanks for the update.

Karl

On Thu, Mar 8, 2018 at 10:47 AM, Mike Hugo <[email protected]> wrote:

> Thanks for the ideas and the sanity check!  Based on your feedback we've
> been able to narrow down the problem to something in the custom output
> connector.  Seems we need to join the thread at the end.
>
> On Thu, Mar 8, 2018 at 9:37 AM, Karl Wright <[email protected]> wrote:
>
>> As a sanity check, I ran the postgresql RSS connector IT test on trunk
>> and it passed:
>>
>> >>>>>>
>> run-IT-postgresql:
>>     [junit] Testsuite: org.apache.manifoldcf.crawler.
>> connectors.rss.tests.RSSSimpleCrawlPostgresqlIT
>>     [junit] Configuration file successfully read
>>     [junit] [main] INFO org.eclipse.jetty.util.log - Logging initialized
>> @3336ms
>>     [junit] [main] INFO org.eclipse.jetty.server.Server -
>> jetty-9.2.3.v20140905
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Started o.e.j.w.WebAppContext@4d1c005e{/mcf-crawler-ui,file:/C:/User
>> s/kawright/AppData/Local/Temp/
>> jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler-ui-any-
>> 4871569714684839734.dir/webapp/,AVAILABLE}{C:\wip\mcf\
>> trunk\dist/web/war/mcf-crawler-ui.war}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Started o.e.j.w.WebAppContext@8462f31{/mcf-authority-service,file:/C
>> :/Users/kawright/AppData/Local
>> /Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf-auth
>> ority-service-any-8765187688005999492.dir/webapp/,AVAILABLE}
>> {C:\wip\mcf\trunk\dist/web/war/mcf-authority-service
>> .war}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Started o.e.j.w.WebAppContext@24569dba{/mcf-api-service,file:/C:/Use
>> rs/kawright/AppData/Local/Temp
>> /jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service-any
>> -1263632524762735599.dir/webapp/,AVAILABLE}{C:\wip\mcf\trunk
>> \dist/web/war/mcf-api-service.war}
>>     [junit] [main] INFO org.eclipse.jetty.server.ServerConnector -
>> Started ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346}
>>     [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6277ms
>>     [junit] [main] INFO org.eclipse.jetty.server.Server -
>> jetty-9.2.3.v20140905
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Started o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,AVAILABLE}
>>     [junit] [main] INFO org.eclipse.jetty.server.ServerConnector -
>> Started ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189}
>>     [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6290ms
>>     [junit] Crawl required 90542 milliseconds
>>     [junit] [main] INFO org.eclipse.jetty.server.ServerConnector -
>> Stopped ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Stopped o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,UNAVAILABLE}
>>     [junit] [main] INFO org.eclipse.jetty.server.ServerConnector -
>> Stopped ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Stopped o.e.j.w.WebAppContext@24569dba{/mcf-api-service,file:/C:/Use
>> rs/kawright/AppData/Local/Temp
>> /jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service-any
>> -1263632524762735599.dir/webapp/,UNAVAILABLE}{C:\wip\mcf\
>> trunk\dist/web/war/mcf-api-service.war}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Stopped o.e.j.w.WebAppContext@8462f31{/mcf-authority-service,file:/C
>> :/Users/kawright/AppData/Local
>> /Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf-auth
>> ority-service-any-8765187688005999492.dir/webapp/,
>> UNAVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-authority-servi
>> ce.war}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Stopped o.e.j.w.WebAppContext@4d1c005e{/mcf-crawler-ui,file:/C:/User
>> s/kawright/AppData/Local/Temp/
>> jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler-ui-any-
>> 4871569714684839734.dir/webapp/,UNAVAILABLE}{C:\wip\
>> mcf\trunk\dist/web/war/mcf-crawler-ui.war}
>>     [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time
>> elapsed: 126.5 sec
>>     [junit]
>>     [junit] ------------- Standard Error -----------------
>>     [junit] Configuration file successfully read
>>     [junit] [main] INFO org.eclipse.jetty.util.log - Logging initialized
>> @3336ms
>>     [junit] [main] INFO org.eclipse.jetty.server.Server -
>> jetty-9.2.3.v20140905
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Started o.e.j.w.WebAppContext@4d1c005e{/mcf-crawler-ui,file:/C:/User
>> s/kawright/AppData/Local/Temp/
>> jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler-ui-any-
>> 4871569714684839734.dir/webapp/,AVAILABLE}{C:\wip\mcf\
>> trunk\dist/web/war/mcf-crawler-ui.war}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Started o.e.j.w.WebAppContext@8462f31{/mcf-authority-service,file:/C
>> :/Users/kawright/AppData/Local
>> /Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf-auth
>> ority-service-any-8765187688005999492.dir/webapp/,AVAILABLE}
>> {C:\wip\mcf\trunk\dist/web/war/mcf-authority-service
>> .war}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Started o.e.j.w.WebAppContext@24569dba{/mcf-api-service,file:/C:/Use
>> rs/kawright/AppData/Local/Temp
>> /jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service-any
>> -1263632524762735599.dir/webapp/,AVAILABLE}{C:\wip\mcf\trunk
>> \dist/web/war/mcf-api-service.war}
>>     [junit] [main] INFO org.eclipse.jetty.server.ServerConnector -
>> Started ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346}
>>     [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6277ms
>>     [junit] [main] INFO org.eclipse.jetty.server.Server -
>> jetty-9.2.3.v20140905
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Started o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,AVAILABLE}
>>     [junit] [main] INFO org.eclipse.jetty.server.ServerConnector -
>> Started ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189}
>>     [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6290ms
>>     [junit] Crawl required 90542 milliseconds
>>     [junit] [main] INFO org.eclipse.jetty.server.ServerConnector -
>> Stopped ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Stopped o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,UNAVAILABLE}
>>     [junit] [main] INFO org.eclipse.jetty.server.ServerConnector -
>> Stopped ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Stopped o.e.j.w.WebAppContext@24569dba{/mcf-api-service,file:/C:/Use
>> rs/kawright/AppData/Local/Temp
>> /jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service-any
>> -1263632524762735599.dir/webapp/,UNAVAILABLE}{C:\wip\mcf\
>> trunk\dist/web/war/mcf-api-service.war}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Stopped o.e.j.w.WebAppContext@8462f31{/mcf-authority-service,file:/C
>> :/Users/kawright/AppData/Local
>> /Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf-auth
>> ority-service-any-8765187688005999492.dir/webapp/,
>> UNAVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-authority-servi
>> ce.war}
>>     [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler
>> - Stopped o.e.j.w.WebAppContext@4d1c005e{/mcf-crawler-ui,file:/C:/User
>> s/kawright/AppData/Local/Temp/
>> jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler-ui-any-
>> 4871569714684839734.dir/webapp/,UNAVAILABLE}{C:\wip\
>> mcf\trunk\dist/web/war/mcf-crawler-ui.war}
>>     [junit] ------------- ---------------- ---------------
>>
>> BUILD SUCCESSFUL
>> Total time: 2 minutes 8 seconds
>> <<<<<<
>>
>> This is running against my installed laptop version of Postgresql on
>> Windows (version 9.3), with the shipping Postgresql JDBC driver 42.1.3.
>> The test is a simple crawl against a locally-written RSS service.
>>
>>
>> Karl
>>
>>
>> On Thu, Mar 8, 2018 at 9:54 AM, Karl Wright <[email protected]> wrote:
>>
>>> I've reviewed all changes to the RSS connector and to the framework over
>>> the last year, and none of them could reasonably have been expected to have
>>> any kind of effect like this.  The only things changed were the redirect
>>> strategy and updating to the latest Postgresql JDBC driver.
>>>
>>> If the problem doesn't occur in the single-process example, the next
>>> question is: do you have a multiprocess setup?  If so, try the multiprocess
>>> example and see if that succeeds.  If it does, the problem is how we work
>>> with Postgresql.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Mar 8, 2018 at 9:41 AM, Karl Wright <[email protected]> wrote:
>>>
>>>> Hi Mike,
>>>>
>>>> You are the third person this morning that has reported this in
>>>> conjunction with Postgresql.  It is possible that some behavior we count on
>>>> broke in the latest postgresql release.  Can you tell me what version you
>>>> are using?  Do you see the same behavior when you run with the built-in
>>>> HSQLDB example?
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Thu, Mar 8, 2018 at 9:32 AM, Mike Hugo <[email protected]> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I set up a new manifold instance based on the simple example.  I
>>>>> modified properties.xml to point to a postgresql database and then set it
>>>>> up to read an RSS feed.  It uses a custom output connector to send the 
>>>>> data
>>>>> to a custom API.
>>>>>
>>>>> I've noticed that it starts properly, but it only pulls in 3 or 4
>>>>> records before it "hangs" and doesn't pull in more docs after that.  If I
>>>>> bounce the server then it will pull in 3 or 4 more docs, but then seems to
>>>>> hang again.
>>>>>
>>>>> I can add a new RSS feed and start it, but it won't pull in any
>>>>> documents until the server is bounced.
>>>>>
>>>>> I increased the value of org.apache.manifoldcf.crawler.threads and
>>>>> that seems to help, but it just delays the same behavior.  For example, it
>>>>> might pull in 10 or 15 docs, but then stops pulling them in again.  No
>>>>> messages in the logs.
>>>>>
>>>>> It does appear that it's spawning many many of these threads:
>>>>> ExecuteQueryThread
>>>>>
>>>>> Any ideas where to start looking or how to debug why it hangs after
>>>>> only a few documents?
>>>>>
>>>>> Thanks!!
>>>>>
>>>>> Mike
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to