GitHub user nxmbriggs404 opened a pull request: https://github.com/apache/kafka/pull/3283
KAFKA-2170: Updated Fixes For Windows Platform During stress testing of kafka 0.10.2.1 on a Windows platform, our group has encountered some issues that appear to be known to the community but not fully addressed by kafka. Using: https://github.com/apache/kafka/pull/154 as a guide, we have made derived changes to the source code and automated tests such that the "clients" and "core" tests pass for us on Windows and Linux platforms. Our stress tests succeed as well. This pull request adapts those changes to merge and build with kafka/trunk. The "clients" and "core" tests from kafka/trunk pass on Linux for us with these changes in place, and all tests pass on Windows except: ConsumerBounceTest (intermittent failures) TransactionsTest DeleteTopicTest.testDeleteTopicWithCleaner EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards Our intention is to help efforts to further kafka support for the Windows platform. Our changes are the work of engineers from Nexidia building upon the work found in the aforementioned pull request link, and they are contributed to the community per kafka's open source license. We welcome all feedback and look forward to working with the kafka community. Matt Briggs Principal Software Engineer Nexidia, a NICE Analytics Company www.nexidia.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/nxmbriggs404/kafka nx-windows-fixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3283.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3283 ---- commit 6ee3c167c6e2daa8ce4564d98f9f63967a0efece Author: Matt Briggs <mbri...@nexidia.com> Date: 2017-06-06T15:10:58Z Handle log file deletion and renaming on Windows Special treatment is needed for the deletion and renaming of log and index files on Windows, due to the latter's general inability to perform those operations while a file is opened or memory mapped. The changes in this commit are essentially adapted from: https://github.com/apache/kafka/pull/154 More detailed background information on the issues can also be found via that link. commit a0cd773a8d89d7df90fc75ce55a46fd8bb93d368 Author: Matt Briggs <mbri...@nexidia.com> Date: 2017-06-06T15:21:23Z Colliding log filenames cause test failures on Windows This commit addresses an edge case with compaction and asynchronous deletion of log files initially encountered when debugging: LogCleanerTest.testRecoveryAfterCrash failures on Windows. It appears that troubles arise when compaction of logs results in two segments having the same base address, hence the same file names, and the segments get scheduled for background deletion. If one segment's files are pending deletion at the time the other segment's files are scheduled for deletion, the file rename attempted during the latter will fail on Windows (due to the typical Windows issues with open/memory-mapped files). It doesn't appear like we can simply close out the former files, as it seems that kafka intends to have them open for concurrent readers until the file deletion interval has fully passed. The fix in this commit basically sidesteps the issue by ensuring files scheduled for background delete are renamed uniquely (by injecting a UUID into the filename). Essentially this follows the approach taken with LogManager.asyncDelete and Log.DeleteDirSuffix. Collision related errors were also observed when running a custom stress test on Windows against a standalone kafka server. The test code caused extremely frequent compaction of the __consumer_offsets topic partitions, which resulted in collisions of the latter's log files when they were scheduled for deletion. Use of the UUID was successful in avoiding collision related issues in this context. commit 3633d493bc3c0de3f177eecd11e374be74d4ac32 Author: Matt Briggs <mbri...@nexidia.com> Date: 2017-06-06T15:27:59Z Fixing log recovery crash on Windows When a sanity check failure was detected by log recovery code, an attempt to delete index files that were memory-mapped would lead to a crash on Windows. This commit adjusts the code to unmap, delete, recreate, and remap index files such the recovery can continue. Issue was found via the LogTest.testCorruptLog test commit 6b8debd2b906d8691dba73fbbc917f10febf4959 Author: Matt Briggs <mbri...@nexidia.com> Date: 2017-06-06T19:11:31Z Windows-driven resource cleanup in tests Fix exceptions encountered when running automated tests on Windows, arising from the latter's issues with removal/renaming of opened/memory-mapped files. Includes changes adapted from: https://github.com/apache/kafka/pull/154 commit f0c6af8997f1b05a0d29b211dd6cf0d587370f8e Author: Matt Briggs <mbri...@nexidia.com> Date: 2017-06-07T18:29:18Z Fixing timing related ProducerTest failures Discovered when running on Windows virtual machines commit 2b9a9e2bafd8b7fb48f54c0813641e22508be58c Author: Matt Briggs <mbri...@nexidia.com> Date: 2017-06-07T18:37:55Z Dangling open lock files cause Windows test failures Log manager code would fail to close an open lock file if it was unsuccessful in actually locking the file. On Windows this would lead to automated test failures as cleanup code attempted to remove the directory while the lock file remained open. This could be seen for example in: LogManagerTest.testTwoLogManagersUsingSameDirFails commit cb083c0ccd2a220c983e65802df1eb16e65b109d Author: Matt Briggs <mbri...@nexidia.com> Date: 2017-06-07T15:05:49Z Fixed SASL test failures on Windows JAAS config file written by test code contained a keytab location which caused numerous SASL tests to fail. Issue was the keytab location containing backslashes on Windows, which were escaped when writing out the the JAAS config. Switching over to Unix forward slashes resolved the issue. commit 319ce1d2da9ebac2d99b384de871e34f9d184a84 Author: Matt Briggs <mbri...@nexidia.com> Date: 2017-06-07T17:00:37Z Fixed checkstyle suppressions for Windows Some suppressions were not being obeyed, resulting in checkstyle failures on Windows but not linux. Root issue is described here: http://rolf-engelhard.de/2012/11/using-checkstyles-suppression-filters-on-windows-and-linux/ ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---