[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Brock Noland has posted comments on this change. Change subject: [tests] fix test which fails with two cpus and document other dependencies .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/4446/6//COMMIT_MSG Commit Message: PS6, Line 17: 2) the fact that capacity of SharedLRUCache is higher than the capacity : configured if the configured capacity is not divisible by the the number : of CPUs. : : For example, the capacity is set here: : : FLAGS_codegen_cache_capacity = 10; : : However, if the capacity is not perfectly divisible by the number of CPUs, : actual capacity is slightly higher. : : CPU 2 => Capacity 10, 5/shard : CPU 4 => Capacity 12, 3/shard : CPU 8 => Capacity 16, 2/shard : : Due to this calculation: : : const size_t per_shard = (capacity + (num_shards - 1)) / num_shards; > I appreciate the detailed explanation here, but I'm having a hard time unde Shoot, my explanation as subpar. I will update, but in each of those examples the configured capacity is 10, but the actual capacity is 10, 12, 16. More details coming. -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock NolandGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Adar Dembo has posted comments on this change. Change subject: [tests] fix test which fails with two cpus and document other dependencies .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/4446/6//COMMIT_MSG Commit Message: PS6, Line 16: SharedLRUCache Nit: this should be ShardedLRUCache. Below too. PS6, Line 17: 2) the fact that capacity of SharedLRUCache is higher than the capacity : configured if the configured capacity is not divisible by the the number : of CPUs. : : For example, the capacity is set here: : : FLAGS_codegen_cache_capacity = 10; : : However, if the capacity is not perfectly divisible by the number of CPUs, : actual capacity is slightly higher. : : CPU 2 => Capacity 10, 5/shard : CPU 4 => Capacity 12, 3/shard : CPU 8 => Capacity 16, 2/shard : : Due to this calculation: : : const size_t per_shard = (capacity + (num_shards - 1)) / num_shards; I appreciate the detailed explanation here, but I'm having a hard time understanding it without looking at the code too. Maybe you can work in how num_shards is calculated? And make explicit that the actual capacity is per_shard * num_shards? Also, in each of the examples you've provided, the capacity is perfectly divisible by the number of CPUs. Maybe add some negative examples? PS6, Line 47: shard's Nit: shards http://gerrit.cloudera.org:8080/#/c/4446/6/docs/installation.adoc File docs/installation.adoc: Please update the various script examples in this file, as well as the SLES instructions. http://gerrit.cloudera.org:8080/#/c/4446/6/src/kudu/codegen/codegen-test.cc File src/kudu/codegen/codegen-test.cc: Line 373: FLAGS_codegen_cache_capacity = 20; This fixes the test for two CPUs, but can we either fix the test to be more robust when the capacity isn't exactly 20, or fix the underlying sharding implementation to always use the provided capacity (maybe we'd need to enforce only certain kinds of capacity values e.g. powers of 2 to do this)? http://gerrit.cloudera.org:8080/#/c/4446/6/src/kudu/gutil/sysinfo.cc File src/kudu/gutil/sysinfo.cc: PS6, Line 64: "Advanced option. Use at your own risk"); I think what Todd meant by "flag it as advanced" was that you use TAG_FLAG to explicitly tag it. However, tag_flags is part of the util module, which depends on gutil, so doing so would yield a circular dependency. One way to address it would be as follows: 1) Add a kudu::NumCPUs() method somewhere in the util module. 2) Make it call base::NumCPUs(). 3) Define this new flag and handle the override there. 4) Change all existing callers to use kudu::NumCPUs() instead. -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock NolandGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4446 to look at the new patch set (#6). Change subject: [tests] fix test which fails with two cpus and document other dependencies .. [tests] fix test which fails with two cpus and document other dependencies Various tests fail due to lack of lsof and low resource limits, which are not documented. CodegenTest.TestCodeCache - fails on my two cpu host, t2.large increasing the cache size resolves this failure. The test is depending on: 1) key skew in the SharedLRUCache to obtain hits 2) the fact that capacity of SharedLRUCache is higher than the capacity configured if the configured capacity is not divisible by the the number of CPUs. For example, the capacity is set here: FLAGS_codegen_cache_capacity = 10; However, if the capacity is not perfectly divisible by the number of CPUs, actual capacity is slightly higher. CPU 2 => Capacity 10, 5/shard CPU 4 => Capacity 12, 3/shard CPU 8 => Capacity 16, 2/shard Due to this calculation: const size_t per_shard = (capacity + (num_shards - 1)) / num_shards; Additionally, the test depends on key skew. For example, I added some temporary logging which logged each insert. Let's look at inserts into shard 0. Under the 4 CPU case, where each shard has a capacity of 3, shard 3 only sees three inserts in pass 0 resulting in hits on the next pass: pass: 0 Insert: hash = 460595995, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 326003543, shard = 0 pass: 1 Under the two CPU case, both shard's see more than 5 inserts, causing no cache hits. pass: 0 Insert: hash = 1886151623, shard = 0 Insert: hash = 1395239506, shard = 0 Insert: hash = 1931154674, shard = 0 Insert: hash = 460595995, shard = 0 Insert: hash = 1440596256, shard = 0 Insert: hash = 1870227699, shard = 0 Insert: hash = 1163308785, shard = 0 Insert: hash = 1980547462, shard = 0 Insert: hash = 1106104592, shard = 0 Insert: hash = 1702846352, shard = 0 Insert: hash = 1230845174, shard = 0 Insert: hash = 1903296752, shard = 0 Insert: hash = 1395526688, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 1540160781, shard = 0 Insert: hash = 1377131543, shard = 0 Insert: hash = 2125989246, shard = 0 Insert: hash = 326003543, shard = 0 pass: 1 Insert: hash = 1886151623, shard = 0 Insert: hash = 1395239506, shard = 0 Insert: hash = 1931154674, shard = 0 Insert: hash = 460595995, shard = 0 Insert: hash = 1440596256, shard = 0 Insert: hash = 1870227699, shard = 0 Insert: hash = 1163308785, shard = 0 Insert: hash = 1980547462, shard = 0 Insert: hash = 1106104592, shard = 0 Insert: hash = 1702846352, shard = 0 Insert: hash = 1230845174, shard = 0 Insert: hash = 1903296752, shard = 0 Insert: hash = 1395526688, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 1540160781, shard = 0 Insert: hash = 1377131543, shard = 0 Insert: hash = 2125989246, shard = 0 Insert: hash = 326003543, shard = 0 AFAICT increasing the capacity of the cache doesn't impact correctness. Change-Id: I81b70f63923078d449f6541a61b292517e49877d --- M docs/installation.adoc M src/kudu/codegen/codegen-test.cc M src/kudu/gutil/sysinfo.cc 3 files changed, 11 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/6 -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock NolandGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Brock Noland has posted comments on this change. Change subject: [tests] fix test which fails with two cpus and document other dependencies .. Patch Set 5: Makes sense. I thought /proc/sys/kernel/pid_max was higher on 64bit systems than the 32bit default. -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock NolandGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Brock Noland has posted comments on this change. Change subject: [tests] fix test which fails with two cpus and document other dependencies .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/4446/5/docs/installation.adoc File docs/installation.adoc: Line 56: - Limits nproc and nofile greater than 32768 > which tests fail with nproc <= 32768? If this is just a test requirement, I I didn't hit nproc, I hit nofile. However, given the low default settings for both of these, I was just suggesting we just document a reasonable setting "generally" which is why I placed it here. Happy to move it somewhere else or adjust as needed. BTW, CM sets both limits to: Max processes 6553665536processes Max open files3276832768files for everything running as a child process. As such, it might make sense to ensure the most tested configuration is specified here. -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock NolandGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Todd Lipcon has posted comments on this change. Change subject: [tests] fix test which fails with two cpus and document other dependencies .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/4446/5/docs/installation.adoc File docs/installation.adoc: Line 56: - Limits nproc and nofile greater than 32768 which tests fail with nproc <= 32768? If this is just a test requirement, I dont think it should be in the installation adoc. Plus I'm surprised we use that many threads in any test (perhaps it's a bug) http://gerrit.cloudera.org:8080/#/c/4446/5/src/kudu/gutil/sysinfo.cc File src/kudu/gutil/sysinfo.cc: Line 63: DEFINE_int32(num_cpus, 0, "Override number of CPUs by setting to a value > 0"); can you flag this as advanced? also perhaps say something like 'Override the auto-detected number of CPUs on this system' since you can't actually override the number of CPUs :) -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock NolandGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4446 to look at the new patch set (#5). Change subject: [tests] fix test which fails with two cpus and document other dependencies .. [tests] fix test which fails with two cpus and document other dependencies Various tests fail due to lack of lsof and low resource limits, which are not documented. CodegenTest.TestCodeCache - fails on my two cpu host, t2.large increasing the cache size resolves this failure. The test is depending on: 1) key skew in the SharedLRUCache to obtain hits 2) the fact that capacity of SharedLRUCache is higher than the capacity configured if the configured capacity is not divisible by the the number of CPUs. For example, the capacity is set here: FLAGS_codegen_cache_capacity = 10; However, if the capacity is not perfectly divisible by the number of CPUs, actual capacity is slightly higher. CPU 2 => Capacity 10, 5/shard CPU 4 => Capacity 12, 3/shard CPU 8 => Capacity 16, 2/shard Due to this calculation: const size_t per_shard = (capacity + (num_shards - 1)) / num_shards; Additionally, the test depends on key skew. For example, I added some temporary logging which logged each insert. Let's look at inserts into shard 0. Under the 4 CPU case, where each shard has a capacity of 3, shard 3 only sees three inserts in pass 0 resulting in hits on the next pass: pass: 0 Insert: hash = 460595995, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 326003543, shard = 0 pass: 1 Under the two CPU case, both shard's see more than 5 inserts, causing no cache hits. pass: 0 Insert: hash = 1886151623, shard = 0 Insert: hash = 1395239506, shard = 0 Insert: hash = 1931154674, shard = 0 Insert: hash = 460595995, shard = 0 Insert: hash = 1440596256, shard = 0 Insert: hash = 1870227699, shard = 0 Insert: hash = 1163308785, shard = 0 Insert: hash = 1980547462, shard = 0 Insert: hash = 1106104592, shard = 0 Insert: hash = 1702846352, shard = 0 Insert: hash = 1230845174, shard = 0 Insert: hash = 1903296752, shard = 0 Insert: hash = 1395526688, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 1540160781, shard = 0 Insert: hash = 1377131543, shard = 0 Insert: hash = 2125989246, shard = 0 Insert: hash = 326003543, shard = 0 pass: 1 Insert: hash = 1886151623, shard = 0 Insert: hash = 1395239506, shard = 0 Insert: hash = 1931154674, shard = 0 Insert: hash = 460595995, shard = 0 Insert: hash = 1440596256, shard = 0 Insert: hash = 1870227699, shard = 0 Insert: hash = 1163308785, shard = 0 Insert: hash = 1980547462, shard = 0 Insert: hash = 1106104592, shard = 0 Insert: hash = 1702846352, shard = 0 Insert: hash = 1230845174, shard = 0 Insert: hash = 1903296752, shard = 0 Insert: hash = 1395526688, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 1540160781, shard = 0 Insert: hash = 1377131543, shard = 0 Insert: hash = 2125989246, shard = 0 Insert: hash = 326003543, shard = 0 AFAICT increasing the capacity of the cache doesn't impact correctness. Change-Id: I81b70f63923078d449f6541a61b292517e49877d --- M docs/installation.adoc M src/kudu/codegen/codegen-test.cc M src/kudu/gutil/sysinfo.cc 3 files changed, 10 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/5 -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock NolandGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4446 to look at the new patch set (#4). Change subject: [tests] fix test which fails with two cpus and document other dependencies .. [tests] fix test which fails with two cpus and document other dependencies Various tests fail due to lack of lsof and low resource limits, which are not documented. CodegenTest.TestCodeCache - fails on my two cpu host, t2.large increasing the cache size resolves this failure. The test is depending on: 1) key skew in the SharedLRUCache to obtain hits 2) the fact that capacity of SharedLRUCache is higher than the capacity configured if the configured capacity is not divisible by the the number of CPUs. For example, the capacity is set here: FLAGS_codegen_cache_capacity = 10; However, if the capacity is not perfectly divisible by the number of CPUs, actual capacity is slightly higher. CPU 2 => Capacity 10, 5/shard CPU 4 => Capacity 12, 3/shard CPU 8 => Capacity 16, 2/shard Due to this calculation: const size_t per_shard = (capacity + (num_shards - 1)) / num_shards; Additionally, the test depends on key skew. For example, I added some temporary logging which logged each insert. Let's look at inserts into shard 0. Under the 4 CPU case, where each shard has a capacity of 3, shard 3 only sees three inserts in pass 0 resulting in hits on the next pass: pass: 0 Insert: hash = 460595995, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 326003543, shard = 0 pass: 1 Under the two CPU case, both shard's see more than 5 inserts, causing no cache hits. pass: 0 Insert: hash = 1886151623, shard = 0 Insert: hash = 1395239506, shard = 0 Insert: hash = 1931154674, shard = 0 Insert: hash = 460595995, shard = 0 Insert: hash = 1440596256, shard = 0 Insert: hash = 1870227699, shard = 0 Insert: hash = 1163308785, shard = 0 Insert: hash = 1980547462, shard = 0 Insert: hash = 1106104592, shard = 0 Insert: hash = 1702846352, shard = 0 Insert: hash = 1230845174, shard = 0 Insert: hash = 1903296752, shard = 0 Insert: hash = 1395526688, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 1540160781, shard = 0 Insert: hash = 1377131543, shard = 0 Insert: hash = 2125989246, shard = 0 Insert: hash = 326003543, shard = 0 pass: 1 Insert: hash = 1886151623, shard = 0 Insert: hash = 1395239506, shard = 0 Insert: hash = 1931154674, shard = 0 Insert: hash = 460595995, shard = 0 Insert: hash = 1440596256, shard = 0 Insert: hash = 1870227699, shard = 0 Insert: hash = 1163308785, shard = 0 Insert: hash = 1980547462, shard = 0 Insert: hash = 1106104592, shard = 0 Insert: hash = 1702846352, shard = 0 Insert: hash = 1230845174, shard = 0 Insert: hash = 1903296752, shard = 0 Insert: hash = 1395526688, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 1540160781, shard = 0 Insert: hash = 1377131543, shard = 0 Insert: hash = 2125989246, shard = 0 Insert: hash = 326003543, shard = 0 AFAICT increasing the capacity of the cache doesn't impact correctness. Change-Id: I81b70f63923078d449f6541a61b292517e49877d --- M docs/installation.adoc M src/kudu/codegen/codegen-test.cc M src/kudu/gutil/sysinfo.cc 3 files changed, 10 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/4 -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock NolandGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley