A second release candidate has been built, which fixes the issues discovered in RC0. Can be downloaded from the same place.
Karl On Wed, Sep 17, 2014 at 7:36 AM, Karl Wright <[email protected]> wrote: > There is now a release candidate for 1.7.1 that can be downloaded and > installed at http://people.apache.org/~kwright/apache-manifoldcf-1.7.1 . > > Thanks! > Karl > > > On Wed, Sep 17, 2014 at 7:05 AM, Aeham Abushwashi < > [email protected]> wrote: > >> Thanks Erlend and Karl >> >> On 17 September 2014 12:03, Karl Wright <[email protected]> wrote: >> >>> Yes, this problem was introduced in 1.6. >>> >>> Karl >>> >>> Sent from my Windows Phone >>> From: Erlend Garåsen >>> Sent: 9/17/2014 6:06 AM >>> To: [email protected] >>> Subject: Re: Zookeeper configured MCF not working in production mode >>> >>> I guess the issue affects version 1.6.x as well. We had exactly the same >>> problem with that version, but unfortunately I have no thread dump from >>> that time to investigate. >>> >>> Erlend >>> >>> On 17.09.14 12:01, Aeham Abushwashi wrote: >>> > Thanks for finding and fixing the issue. Could you confirm whether it >>> > affects 1.6.x? A quick look at ZooKeeperConnection.obtainWriteLock() in >>> > 1.6.1 shows the same pattern identified in CONNECTORS-1031 - >>> > >>> https://issues.apache.org/jira/browse/CONNECTORS-1031?focusedCommentId=14135978 >>> > >>> > On 16 September 2014 22:19, Karl Wright <[email protected] >>> > <mailto:[email protected]>> wrote: >>> > >>> > I believe I've fixed the problem for real. There's a patch >>> attached >>> > to the CONNECTORS-1031 ticket, which should be applicable to 1.7. >>> > The fix is already checked into the dev_1x branch, as well as trunk >>> > (which is MCF 2.0, so don't use that yet). >>> > >>> > I also believe that we're going to need to make a 1.7.1 release >>> that >>> > contains this fix, and others of similar importance. >>> > >>> > Karl >>> > >>> > >>> > On Tue, Sep 16, 2014 at 9:15 AM, Karl Wright <[email protected] >>> > <mailto:[email protected]>> wrote: >>> > >>> > After some research, I found that increasing the zookeeper.cfg >>> > tick time count from 2000 to 5000 makes this problem go away >>> for me. >>> > >>> > Clearly we have an issue, still, with resetting zookeeper >>> > connections after tick timeout failures. The connections are >>> > reset but the state of the connections are somehow incorrect. >>> > I'll need to do more research to figure out how this can be >>> > addressed. >>> > >>> > For the interim, increasing the tick time seems to be a >>> > reasonable workaround. >>> > >>> > Thanks, >>> > Karl >>> > >>> > >>> > On Tue, Sep 16, 2014 at 8:14 AM, Karl Wright < >>> [email protected] >>> > <mailto:[email protected]>> wrote: >>> > >>> > Believe it or not, I was able to reproduce this here with a >>> > crawl of 100000 documents. I get this in the Zookeeper >>> > server-side log, hundreds of times: >>> > >>> > >>>>>> >>> > [SyncThread:0] ERROR >>> > org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exce >>> > ption: >>> > java.nio.channels.CancelledKeyException >>> > at >>> > >>> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) >>> > at >>> > >>> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) >>> > at >>> > >>> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja >>> > va:153) >>> > at >>> > >>> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn. >>> > java:1076) >>> > at >>> > >>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina >>> > lRequestProcessor.java:170) >>> > at >>> > >>> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro >>> > cessor.java:167) >>> > at >>> > >>> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce >>> > ssor.java:101) >>> > [SyncThread:0] ERROR >>> > org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exce >>> > ption: >>> > java.nio.channels.CancelledKeyException >>> > at >>> > >>> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) >>> > at >>> > >>> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) >>> > at >>> > >>> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja >>> > va:153) >>> > at >>> > >>> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn. >>> > java:1076) >>> > at >>> > >>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina >>> > lRequestProcessor.java:170) >>> > at >>> > >>> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro >>> > cessor.java:167) >>> > at >>> > >>> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce >>> > ssor.java:101) >>> > <<<<<< >>> > >>> > ... and then everything locks up. I have no idea what is >>> > happening; seems to be an NIO exception ZooKeeper is not >>> > expecting. >>> > >>> > Karl >>> > >>> > >>> > On Tue, Sep 16, 2014 at 7:52 AM, Erlend Garåsen >>> > <[email protected] <mailto:[email protected]>> >>> > wrote: >>> > >>> > >>> > Ouch, I forgot to place the Zookeeper logs on web. >>> Since >>> > they do not include timestamps and I have restarted MCF >>> > after a few changes, I guess it will be difficult to >>> get >>> > the relevant lines. I'll do that next time it hangs, >>> > probably in the end of the day. >>> > >>> > I will add the new Zookeeper configuration settings as >>> > Lalit suggested next time I'm restarting MCF. >>> > >>> > How many worker threads are you using? How many >>> > documents (about) do >>> > you crawl before things hang? >>> > >>> > >>> > Throttling -> max connections: 30 >>> > Throttling -> Max fetches/min: 100 >>> > Bandwith -> max connections: 25 >>> > Bandwith -> max kbytes/sec: 8000 >>> > Bandwith -> max fetches/min: 20 >>> > >>> > I have four jobs configured. The one I'm running now >>> has >>> > 100,000 documents configured. Totally around 110,000 >>> > documents for all four jobs. >>> > >>> > I guess there are more documents involved since the >>> > largest job excludes a lot of documents based on >>> > sophisticated and complex filtering rules. Maybe 50% >>> > more even though they are not added to Solr (but they >>> > are of course fetched). >>> > >>> > Erlend >>> > >>> > >>> > You may also want to try to increase the parameter: >>> > maxClientCnxns in >>> > zookeeper.cfg to something bigger, if you have a >>> lot >>> > of worker threads. >>> > I'm thinking 1000 or some such. See if it makes a >>> > difference for you. >>> > >>> > >>> > I'll try that at next restart. >>> > >>> > Erlend >>> > >>> > >>> > >>> > >>> > >>> >> >> >
