Hi Patrick, I will check if the servers are underprovisioned. I have one cluster with 1 windows server + 2 Linux servers. I will submit the patch for the tests. Thanks.
On Fri, Apr 9, 2010 at 12:46 PM, Patrick Hunt <ph...@apache.org> wrote: > > On 04/09/2010 07:01 AM, Vishal K wrote: > >> Hi Patrick, >> >> So far I have done the following: >> >> 1. Ran junit tests on both windows and Linux. I ran into few issues with >> the >> tests: https://issues.apache.org/jira/browse/ZOOKEEPER-734. Also, the >> path >> to dataDir in zoo.cfg should be either UNIX-style path or should contain >> double backslashes instead of one. I think the path compatiblity issue is >> the only bug found so far. I was a big skeptical earlier about the junit >> test results since the logs of the junit tests on Windows differ from that >> on linux. But as you mentioned in your earlier reply it is probably >> difference in networking libraries (I am using the same java version on >> both). >> >> > Ok, great. I've slated 734 for 3.3.1/3.4.0. If you'd like to contribute a > patch please feel free (you need to submit as an attachment rather than > inlining in the desc, see this link for details): > http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute > > > 2. Created a 1 Windows + 2 Linux cluster. Did some hand testing and things >> seem to work fine. >> >> > 3 distinct clusters you mean or 3 servers; 1 on windows and 2 on linux? > (that would be a interesting test btw :-) ) > > > 3. I tried to run smoketest on the cluster, but I had some issues with the >> test. I will be trying to run this test again soon. >> >> > Ben has been out of the office, once he gets back he's the best person to > shed light. If you have specific questions someone else might be able to > shed light in the mean time. > > > 4. I tried systest and generateLoad test. systest passed but generateLoad >> test started spewing a lot of connection/session closed exceptions. >> >> > You might be underprovisioning the server(s)? Try monitoring the servers > while running the test. You can use something like > http://github.com/phunt/zktop > esp the "min/avg/max latency" information. > > My experience on windows is that the IO is not as good as unix (esp going > through java). You should def. test under some load (using the latency > tester as I mentioned earlier, simliar to my tests on the wiki) to ensure > that the system will function in this environment. Ensure that you have > enough heap, you might need to tune the GC to turn on cms/incremental GC > (that could also be the issue you saw earlier), and also monitor the disk > activity. > > Btw, have you tested the 4 letter words? stat and such I mean. Would be > good to verify as well as they are critical for monitoring the system once > in production. > > > I hope to run systest, generateloadtest, smoktest and latencytest. If we >> decide to go with the ZK on windows, then we will also write our own tests >> (and do more rigorous QA). I will be happy to add these to the ZK test >> suite >> as well. >> > > Great, happy to have them. See the how to contribute link above. > > > I believe at the end of this exercise we will have fair amount of >> confidence >> in ZK on windows. I will keep you posted. >> > > That will be great. Having an actual user who's running in production under > that configuration would bring even more weight. > > At some point we should look into the C client side of things as well. That > will be harder, but if we can do it it would mean that python/perl/etc... > would also be available for the platform. > > Thanks! > > Patrick > > > >> On Fri, Apr 9, 2010 at 1:01 AM, Patrick Hunt<ph...@apache.org> wrote: >> >> Vishal, in general how are things going with your windows evaluation? Can >>> you give a short review of what you've tested, what's working and what >>> you >>> still would like to look at? You are testing under windows directly, >>> correct? (not cygwin I mean). Any chance you will be able to test the c, >>> python and/or perl bindings? (although this would mean porting to windows >>> compiler if i understand correctly). >>> >>> We should consider updating the documentation to include detail about >>> windows support if you feel comfortable. Enter a JIRA for this if you >>> agree. >>> Perhaps a wiki page detailing status and any issues running on that >>> platform >>> might also be useful? It would be great to offer (with confidence) ZK to >>> users of the windows platform. >>> >>> Thanks for taking on this effort, regards, >>> >>> Patrick >>> >>> >>> On 04/08/2010 03:21 PM, Vishal K wrote: >>> >>> Hi Patrick, Mahadev, >>>> I was going through the logs of junit tests (all of the tests passed). I >>>> noticed the logs for windows are different from the logs on linux. >>>> On windows, the logs show following "Failed to send last message" error >>>> message several times. None of the logs on Linux have them. >>>> Are these messages normal? I have also attached the log file for the >>>> AsyncHammerTest. >>>> 2010-04-08 18:07:35,750 - WARN >>>> [Thread-80:quorumcnxmanager$sendwor...@586] - Exception when using >>>> channel: 4 >>>> java.nio.channels.ClosedChannelException >>>> at >>>> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126) >>>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) >>>> at >>>> >>>> >>>> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:548) >>>> at >>>> >>>> >>>> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:578) >>>> 2010-04-08 18:07:35,750 - WARN >>>> [Thread-80:quorumcnxmanager$sendwor...@589] - Send worker leaving >>>> thread >>>> 2010-04-08 18:07:35,750 - WARN >>>> [Thread-95:quorumcnxmanager$recvwor...@658] - Connection broken: >>>> java.io.IOException: Channel eof >>>> at >>>> >>>> >>>> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) >>>> 2010-04-08 18:07:35,750 - ERROR >>>> [Thread-92:quorumcnxmanager$sendwor...@559] - Failed to send last >>>> message. Shutting down thread. >>>> java.nio.channels.ClosedChannelException >>>> at >>>> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126) >>>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) >>>> at >>>> >>>> >>>> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:548) >>>> at >>>> >>>> >>>> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:557) >>>> 2010-04-08 18:07:35,750 - WARN >>>> [Thread-92:quorumcnxmanager$sendwor...@589] - Send worker leaving threa >>>> Thanks. >>>> Regards, >>>> -Vishal >>>> On Wed, Apr 7, 2010 at 5:53 PM, Patrick Hunt<ph...@apache.org >>>> <mailto:ph...@apache.org>> wrote: >>>> >>>> I can't tell for certain. Looking a bit closer the code that starts >>>> the server for the c tests does reference the clover jar, so my >>>> guess would be that it does, but the messages from clover and the >>>> build, etc.. don't shed any light. I didn't setup the zk clover >>>> stuff for hudson, cc'ing Giri who might have insight. >>>> >>>> Patrick >>>> >>>> >>>> On 04/07/2010 02:34 PM, Vishal K wrote: >>>> >>>> Hi Patrick, >>>> We are not using C clients so we are not worried about porting >>>> them to >>>> Windows. Just out of curiosity, is there any way to confirm if >>>> the code >>>> coverage results are a result of test-core-java. If not, I will >>>> run the >>>> coverage tools locally. >>>> I had tried the smoketests earlier, but I ran into some issues >>>> there. I >>>> forgot to note down the problem. I will revisit and upload the >>>> results. >>>> Thanks for your help. >>>> >>>> On Wed, Apr 7, 2010 at 5:11 PM, Patrick Hunt<ph...@apache.org >>>> <mailto:ph...@apache.org> >>>> <mailto:ph...@apache.org<mailto:ph...@apache.org>>> wrote: >>>> >>>> >>>> On 04/07/2010 01:51 PM, Vishal K wrote: >>>> >>>> Hi Mahadev, >>>> >>>> Thanks for your response. Currently I am running ZK >>>> without >>>> cygwin on >>>> windows. I will give it a try on cygwin. I am not quite >>>> familiar >>>> with >>>> cppunit. Why will cppinit give me more confidence in >>>> native windows >>>> libraries? >>>> >>>> I have few more questions relevant to testing: >>>> 1. How much code coverage do we get with >>>> "test-core-java"? I see >>>> 68.8% >>>> coverage on hudson >>>> >>>> http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/clover/. >>>> Does >>>> this measure the coverage from junit tests run from >>>> "test-core-java"? >>>> >>>> >>>> Right, afaik that's test-core-java. >>>> >>>> I think if you plan to run in production under windows you >>>> would >>>> want to run in win, not cygwin. If that's the case you'd have >>>> to >>>> port the c client and c client tests to some windows >>>> specific compiler. >>>> >>>> >>>> 2. What would be a good and reliable set of tests that >>>> will help >>>> me verify >>>> that the cluster holding up fine on windows. I tried to >>>> run >>>> systest and >>>> generateload. But I am having issues with the tests (and >>>> also >>>> understanding >>>> the output of the tests since I am not familiar with the >>>> source). The >>>> systest did exit with output something ilke Test OK (1). >>>> I >>>> presume this is a >>>> good sign :-) generateLoad crashed and I will look into >>>> it >>>> later. Please let >>>> me know if you have any suggestions. Last few lines are >>>> shown below: >>>> >>>> >>>> Well the best thing would be to run one's own tests for their >>>> specific use cases. Short of that you've done the core set >>>> of tests >>>> that we have available to us. You might try the >>>> smoke/latency tests: >>>> http://github.com/phunt/zk-smoketest >>>> in particular you can run the latency tests from a number of >>>> clients >>>> in parallel and see the results. >>>> http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview >>>> >>>> Patrick >>>> >>>> >>>> >>>> 2010-04-07 14:44:20,055 - WARN >>>> [QuorumPeer:/0.0.0.0:3155:quorump...@662] - >>>> QuorumPeer main thread exited >>>> Got rc = -4 >>>> Got rc = -4 >>>> [many such messages as above] >>>> WatchedEvent state:Disconnected type:None path:null >>>> java.lang.InterruptedException: sleep interrupted >>>> at java.lang.Thread.sleep(Native Method) >>>> at >>>> >>>> >>>> >>>> >>>> org.apache.zookeeper.test.system.GenerateLoad$GeneratorInstance$SenderThread.run(GenerateLoad.java:425) >>>> java.lang.InterruptedException >>>> at java.lang.Object.wait(Native Method) >>>> at java.lang.Object.wait(Object.java:485) >>>> at >>>> >>>> >>>> >>>> >>>> org.apache.zookeeper.test.system.GenerateLoad$GeneratorInstance$ZooKeeperThread.incOutstanding(GenerateLoad.java:305) >>>> at >>>> >>>> >>>> >>>> >>>> org.apache.zookeeper.test.system.GenerateLoad$GeneratorInstance$ZooKeeperThread.run(GenerateLoad.java:353) >>>> >>>> 2010-04-07 14:44:20,711 - INFO >>>> >>>> >>>> [Thread-42-SendThread(vkher-devd:3155):clientcnxn$sendthr...@1000 >>>> ] >>>> - Opening >>>> socket connection to server<host>/<IP>:48214 >>>> Got rc = -4 >>>> 2010-04-07 14:44:22,008 - INFO [Thread-42:zookee...@538 >>>> ] >>>> - >>>> Session:0x127d97600cc0000 closed >>>> >>>> Thanks. >>>> >>>> Regards, >>>> -Vishal >>>> >>>> >>>> On Wed, Apr 7, 2010 at 3:55 PM, Mahadev >>>> Konar<maha...@yahoo-inc.com >>>> <mailto:maha...@yahoo-inc.com> <mailto:maha...@yahoo-inc.com >>>> >>>> <mailto:maha...@yahoo-inc.com>>> wrote: >>>> >>>> >>>> HI Vishal, >>>> I would be a good think to actually get cppunit >>>> working on >>>> windows (rather >>>> than dropping it) since it would make you more >>>> confident on >>>> being able to >>>> use the native libraries for windows. >>>> >>>> Though there is already an open jira to try and >>>> compile >>>> librarires without >>>> CPPIUNIT being installed on the machines. >>>> >>>> http://issues.apache.org/jira/browse/ZOOKEEPER-316 >>>> >>>> >>>> Would you want to try and take a shot at fixing the >>>> cppunit >>>> tests? >>>> >>>> Would be great to have cppunit tests working on >>>> cygwin! >>>> >>>> >>>> Thanks >>>> mahadev >>>> >>>> >>>> On 4/6/10 4:15 PM, "Vishal K"<vishalm...@gmail.com >>>> <mailto:vishalm...@gmail.com> >>>> <mailto:vishalm...@gmail.com<mailto:vishalm...@gmail.com>>> >>>> >>>> wrote: >>>> >>>> >>>> Hi, >>>> >>>> I had few minor problems ( >>>> https://issues.apache.org/jira/browse/ZOOKEEPER-734) >>>> after which all >>>> >>>> junits >>>> >>>> passed (I ran ant test-java-core). >>>> >>>> But the build failed later in >>>> create-cppunit-configure: >>>> >>>> ------------- >>>> test-core-java: >>>> call-test-cppunit: >>>> init: >>>> check-cppunit-makefile: >>>> create-cppunit-makefile: >>>> init: >>>> check-cppunit-configure: >>>> create-cppunit-configure: >>>> [mkdir] Created dir: >>>> >>>> C:\zookeeper\zookeeper-3.3.0\build\test\test-cppunit >>>> BUILD FAILED >>>> C:\zookeeper\zookeeper-3.3.0\build.xml:907: The >>>> following error occurred >>>> while executing this line: >>>> C:\zookeeper\zookeeper-3.3.0\build.xml:865: The >>>> following error occurred >>>> while executing this line: >>>> C:\zookeeper\zookeeper-3.3.0\build.xml:857: >>>> Execute failed: >>>> java.io.IOException: Cannot run program >>>> "C:\zookeeper\zookeeper-3.3.0\src\c\configure" (in directory >>>> "C:\zookeeper\zookeeper-3.3.0\build\test\test-cppunit"): >>>> CreateProcess >>>> error=193, %1 is not a valid Win32 application >>>> at >>>> java.lang.ProcessBuilder.start(ProcessBuilder.java:459) >>>> at java.lang.Runtime.exec(Runtime.java:593 >>>> ----------- >>>> cppunit tests should be probably dropped for >>>> windows. >>>> >>>> Also, one point to note the dataDir in zoo.cfg >>>> should >>>> have UNIX path (or >>>> double backslash instead of single backslash). I >>>> suppose >>>> all Java >>>> >>>> developers >>>> >>>> might be already aware of that. >>>> >>>> Regards, >>>> -Vishal >>>> >>>> On Fri, Apr 2, 2010 at 1:46 PM, Patrick >>>> Hunt<ph...@apache.org<mailto:ph...@apache.org> >>>> <mailto:ph...@apache.org<mailto:ph...@apache.org>>> wrote: >>>> >>>> >>>> >>>> Pretty seamless, just do a rolling upgrade >>>> (see the >>>> faq) of the servers. >>>> Then upgrade your clients. Code APIs on the >>>> client >>>> are all b/w compat. >>>> >>>> Patrick >>>> >>>> >>>> On 04/02/2010 10:41 AM, Vishal K wrote: >>>> >>>> Hi Patrick, >>>> >>>> We have not upgraded to 3.3.0 yet. We >>>> are using >>>> 3.2.2. I did notice the >>>> windows specific batch files in 3.3.0. >>>> How >>>> seemless is upgrade from >>>> >>>> 3.2.2 >>>> >>>> - >>>> >>>> 3.3.0? >>>> >>>> >>>> I will let you know if I run into any >>>> windows >>>> related problems. Thanks. >>>> >>>> On Fri, Apr 2, 2010 at 11:40 AM, Patrick >>>> Hunt<ph...@apache.org >>>> <mailto:ph...@apache.org> <mailto:ph...@apache.org >>>> >>>> <mailto:ph...@apache.org>>> >>>> >>>> >>>> wrote: >>>> >>>> >>>> Are you using 3.3.0? 3.3.0 included a >>>> number of >>>> fixes for cygwin and >>>> >>>> includes windows specific batch >>>> files. If >>>> you are planning to deploy >>>> >>>> to >>>> >>>> production on windows I'd encourage >>>> you to >>>> develop under windows >>>> >>>> directly >>>> >>>> as >>>> well. >>>> >>>> If you find issues, bugs, etc... be >>>> sure to >>>> enter JIRAs. Don't worry, >>>> >>>> you >>>> >>>> won't hurt our feelings, on the >>>> contrary >>>> we'll be happy if you >>>> >>>> find/fix >>>> >>>> issues on windows and make things >>>> better for >>>> everyone. (just make sure >>>> you >>>> are using the latest release). >>>> >>>> Regards, >>>> >>>> Patrick >>>> >>>> >>>> >>>> On 04/02/2010 07:05 AM, Vishal K >>>> wrote: >>>> >>>> Hi, >>>> >>>> >>>> I was able to start zookeeper on >>>> windows >>>> using cygwin. I had to do >>>> >>>> minor >>>> >>>> changes to the shell scripts to >>>> use >>>> cygpath wherever needed. >>>> I will run a few tests and post >>>> the >>>> progress. I greped through the >>>> zookeeper >>>> sources just to check if ZK is >>>> using any >>>> native code. >>>> I didn't find any, but just to >>>> confirm - >>>> is ZK using native code. >>>> >>>> I have talked to a few guys >>>> around and >>>> they said it is fair to assume >>>> that >>>> the programs are portable (to >>>> Windows) >>>> if they don't have native >>>> >>>> code. >>>> >>>> Just >>>> wanted to check. Thanks. >>>> On Thu, Apr 1, 2010 at 10:09 AM, >>>> Vishal >>>> K<vishalm...@gmail.com >>>> <mailto:vishalm...@gmail.com> >>>> <mailto:vishalm...@gmail.com<mailto:vishalm...@gmail.com>>> >>>> >>>> >>>> wrote: >>>> >>>> Hi Patrick, >>>> >>>> >>>> Thanks for your response. I >>>> start >>>> running ZK on windows and >>>> let you >>>> know >>>> if >>>> I run into issues. >>>> >>>> >>>> On Wed, Mar 31, 2010 at >>>> 11:32 AM, >>>> Patrick >>>> Hunt<ph...@apache.org<mailto:ph...@apache.org> >>>> <mailto:ph...@apache.org<mailto:ph...@apache.org>>> >>>> >>>> >>>> wrote: >>>> >>>> >>>> Vishal K wrote: >>>> >>>> >>>> We will be using >>>> zookeeper quite >>>> extensively for >>>> clustering. >>>> >>>> Windows >>>> >>>> is >>>> >>>> one >>>> of the platforms >>>> that we may >>>> need to support. >>>> Since Win32 >>>> is not >>>> supported >>>> as a production >>>> platform I >>>> was wondering to >>>> what extent is >>>> >>>> zookeeper >>>> >>>> tested >>>> on windows. We are >>>> also >>>> interested on using >>>> Zookeeper on Win64 >>>> platform. >>>> Is >>>> Win64 supported? Are >>>> there >>>> any plans to support >>>> Win32/Win64 for >>>> production? >>>> If not, what would >>>> one need >>>> to do support >>>> windows and >>>> what would >>>> >>>> be >>>> >>>> the >>>> estimated QA effort? >>>> >>>> >>>> My use of ZK is >>>> exclusively >>>> 32/64bit linux, >>>> however I >>>> can tell you >>>> >>>> that >>>> given that the >>>> client/server are >>>> implemented in java it >>>> should >>>> >>>> work. >>>> >>>> Problems you might >>>> encounter >>>> would be things like NIO >>>> issues with >>>> >>>> the >>>> >>>> JVM >>>> implementation on >>>> windows. >>>> >>>> Testing on windows? >>>> Pretty much >>>> 0 afaik. We do support >>>> development >>>> >>>> on >>>> >>>> cygwin, so provides some >>>> basic >>>> exercising of the >>>> codepaths with the >>>> windows >>>> jvm, however it's not >>>> likely >>>> production level qa. >>>> >>>> This question (zk on >>>> win) has >>>> come up once or twice >>>> before, I >>>> >>>> haven't >>>> >>>> seen >>>> any followup from the >>>> users who >>>> asked about it >>>> previously though. >>>> >>>> 3.3.0 has batch files for >>>> running the server in >>>> windows, >>>> give those >>>> >>>> a >>>> >>>> try. >>>> Probably what you'd want >>>> to do >>>> is run "ant >>>> test-core-java" or >>>> >>>> similar >>>> >>>> in >>>> the >>>> top of the ZK release >>>> directory. >>>> This will run all the >>>> java tests >>>> >>>> and >>>> >>>> give >>>> you some insight into >>>> status. >>>> I'd be happy to work >>>> with you to >>>> land >>>> patches >>>> that address issues with >>>> ZK on >>>> windows. Depending on >>>> the interest >>>> level >>>> and >>>> support from win users >>>> we could >>>> support win as a >>>> dev/prod platform >>>> >>>> at >>>> >>>> some >>>> point in the future - >>>> having >>>> ongoing support for this >>>> would be >>>> important >>>> though (people interested >>>> in >>>> testing/fixing under win >>>> I >>>> mean). Try >>>> exercising under windows >>>> and >>>> create some JIRAs based >>>> on what you >>>> >>>> find. >>>> >>>> >>>> Regards, >>>> >>>> Patrick >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>