There is now a release candidate for 1.7.1 that can be downloaded and installed at http://people.apache.org/~kwright/apache-manifoldcf-1.7.1 .
Thanks! Karl On Wed, Sep 17, 2014 at 7:05 AM, Aeham Abushwashi < [email protected]> wrote: > Thanks Erlend and Karl > > On 17 September 2014 12:03, Karl Wright <[email protected]> wrote: > >> Yes, this problem was introduced in 1.6. >> >> Karl >> >> Sent from my Windows Phone >> From: Erlend Garåsen >> Sent: 9/17/2014 6:06 AM >> To: [email protected] >> Subject: Re: Zookeeper configured MCF not working in production mode >> >> I guess the issue affects version 1.6.x as well. We had exactly the same >> problem with that version, but unfortunately I have no thread dump from >> that time to investigate. >> >> Erlend >> >> On 17.09.14 12:01, Aeham Abushwashi wrote: >> > Thanks for finding and fixing the issue. Could you confirm whether it >> > affects 1.6.x? A quick look at ZooKeeperConnection.obtainWriteLock() in >> > 1.6.1 shows the same pattern identified in CONNECTORS-1031 - >> > >> https://issues.apache.org/jira/browse/CONNECTORS-1031?focusedCommentId=14135978 >> > >> > On 16 September 2014 22:19, Karl Wright <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > I believe I've fixed the problem for real. There's a patch attached >> > to the CONNECTORS-1031 ticket, which should be applicable to 1.7. >> > The fix is already checked into the dev_1x branch, as well as trunk >> > (which is MCF 2.0, so don't use that yet). >> > >> > I also believe that we're going to need to make a 1.7.1 release that >> > contains this fix, and others of similar importance. >> > >> > Karl >> > >> > >> > On Tue, Sep 16, 2014 at 9:15 AM, Karl Wright <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > After some research, I found that increasing the zookeeper.cfg >> > tick time count from 2000 to 5000 makes this problem go away >> for me. >> > >> > Clearly we have an issue, still, with resetting zookeeper >> > connections after tick timeout failures. The connections are >> > reset but the state of the connections are somehow incorrect. >> > I'll need to do more research to figure out how this can be >> > addressed. >> > >> > For the interim, increasing the tick time seems to be a >> > reasonable workaround. >> > >> > Thanks, >> > Karl >> > >> > >> > On Tue, Sep 16, 2014 at 8:14 AM, Karl Wright < >> [email protected] >> > <mailto:[email protected]>> wrote: >> > >> > Believe it or not, I was able to reproduce this here with a >> > crawl of 100000 documents. I get this in the Zookeeper >> > server-side log, hundreds of times: >> > >> > >>>>>> >> > [SyncThread:0] ERROR >> > org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exce >> > ption: >> > java.nio.channels.CancelledKeyException >> > at >> > >> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) >> > at >> > >> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) >> > at >> > >> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja >> > va:153) >> > at >> > >> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn. >> > java:1076) >> > at >> > >> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina >> > lRequestProcessor.java:170) >> > at >> > >> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro >> > cessor.java:167) >> > at >> > >> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce >> > ssor.java:101) >> > [SyncThread:0] ERROR >> > org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exce >> > ption: >> > java.nio.channels.CancelledKeyException >> > at >> > >> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) >> > at >> > >> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) >> > at >> > >> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja >> > va:153) >> > at >> > >> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn. >> > java:1076) >> > at >> > >> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina >> > lRequestProcessor.java:170) >> > at >> > >> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro >> > cessor.java:167) >> > at >> > >> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce >> > ssor.java:101) >> > <<<<<< >> > >> > ... and then everything locks up. I have no idea what is >> > happening; seems to be an NIO exception ZooKeeper is not >> > expecting. >> > >> > Karl >> > >> > >> > On Tue, Sep 16, 2014 at 7:52 AM, Erlend Garåsen >> > <[email protected] <mailto:[email protected]>> >> > wrote: >> > >> > >> > Ouch, I forgot to place the Zookeeper logs on web. Since >> > they do not include timestamps and I have restarted MCF >> > after a few changes, I guess it will be difficult to get >> > the relevant lines. I'll do that next time it hangs, >> > probably in the end of the day. >> > >> > I will add the new Zookeeper configuration settings as >> > Lalit suggested next time I'm restarting MCF. >> > >> > How many worker threads are you using? How many >> > documents (about) do >> > you crawl before things hang? >> > >> > >> > Throttling -> max connections: 30 >> > Throttling -> Max fetches/min: 100 >> > Bandwith -> max connections: 25 >> > Bandwith -> max kbytes/sec: 8000 >> > Bandwith -> max fetches/min: 20 >> > >> > I have four jobs configured. The one I'm running now has >> > 100,000 documents configured. Totally around 110,000 >> > documents for all four jobs. >> > >> > I guess there are more documents involved since the >> > largest job excludes a lot of documents based on >> > sophisticated and complex filtering rules. Maybe 50% >> > more even though they are not added to Solr (but they >> > are of course fetched). >> > >> > Erlend >> > >> > >> > You may also want to try to increase the parameter: >> > maxClientCnxns in >> > zookeeper.cfg to something bigger, if you have a lot >> > of worker threads. >> > I'm thinking 1000 or some such. See if it makes a >> > difference for you. >> > >> > >> > I'll try that at next restart. >> > >> > Erlend >> > >> > >> > >> > >> > >> > >
