[kudu-CR] [java client] Reinstate KUDU-1364's behavior, fix NPE
Jean-Daniel Cryans has submitted this change and it was merged. Change subject: [java client] Reinstate KUDU-1364's behavior, fix NPE .. [java client] Reinstate KUDU-1364's behavior, fix NPE When d5082d8 tried to fix the client2tablets leak, it also undid the work from KUDU-1364, while also adding new problems. This patch brings back the caching of replica locations even when getting TS disconnections by not purging the RemoteTablet caches on disconnection. Instead, it is now done by the retried RPCs themselves after TabletClient detects an uncaughtException, similarly to how it was calling demoteAsLeaderForAllTablets before. The NPE is fixed with a null check, it's an unfortunate race. I spent some time trying to come up with a simple test but failed. ITClient has found the issue before so we know we have _some_ coverage. Change-Id: I8e0ed23fbf4c655037b77173a187c3fa11de4f63 Reviewed-on: http://gerrit.cloudera.org:8080/4501 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/TabletClient.java 2 files changed, 23 insertions(+), 40 deletions(-) Approvals: David Ribeiro Alves: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4501 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8e0ed23fbf4c655037b77173a187c3fa11de4f63 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel Cryans Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4446 to look at the new patch set (#6). Change subject: [tests] fix test which fails with two cpus and document other dependencies .. [tests] fix test which fails with two cpus and document other dependencies Various tests fail due to lack of lsof and low resource limits, which are not documented. CodegenTest.TestCodeCache - fails on my two cpu host, t2.large increasing the cache size resolves this failure. The test is depending on: 1) key skew in the SharedLRUCache to obtain hits 2) the fact that capacity of SharedLRUCache is higher than the capacity configured if the configured capacity is not divisible by the the number of CPUs. For example, the capacity is set here: FLAGS_codegen_cache_capacity = 10; However, if the capacity is not perfectly divisible by the number of CPUs, actual capacity is slightly higher. CPU 2 => Capacity 10, 5/shard CPU 4 => Capacity 12, 3/shard CPU 8 => Capacity 16, 2/shard Due to this calculation: const size_t per_shard = (capacity + (num_shards - 1)) / num_shards; Additionally, the test depends on key skew. For example, I added some temporary logging which logged each insert. Let's look at inserts into shard 0. Under the 4 CPU case, where each shard has a capacity of 3, shard 3 only sees three inserts in pass 0 resulting in hits on the next pass: pass: 0 Insert: hash = 460595995, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 326003543, shard = 0 pass: 1 Under the two CPU case, both shard's see more than 5 inserts, causing no cache hits. pass: 0 Insert: hash = 1886151623, shard = 0 Insert: hash = 1395239506, shard = 0 Insert: hash = 1931154674, shard = 0 Insert: hash = 460595995, shard = 0 Insert: hash = 1440596256, shard = 0 Insert: hash = 1870227699, shard = 0 Insert: hash = 1163308785, shard = 0 Insert: hash = 1980547462, shard = 0 Insert: hash = 1106104592, shard = 0 Insert: hash = 1702846352, shard = 0 Insert: hash = 1230845174, shard = 0 Insert: hash = 1903296752, shard = 0 Insert: hash = 1395526688, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 1540160781, shard = 0 Insert: hash = 1377131543, shard = 0 Insert: hash = 2125989246, shard = 0 Insert: hash = 326003543, shard = 0 pass: 1 Insert: hash = 1886151623, shard = 0 Insert: hash = 1395239506, shard = 0 Insert: hash = 1931154674, shard = 0 Insert: hash = 460595995, shard = 0 Insert: hash = 1440596256, shard = 0 Insert: hash = 1870227699, shard = 0 Insert: hash = 1163308785, shard = 0 Insert: hash = 1980547462, shard = 0 Insert: hash = 1106104592, shard = 0 Insert: hash = 1702846352, shard = 0 Insert: hash = 1230845174, shard = 0 Insert: hash = 1903296752, shard = 0 Insert: hash = 1395526688, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 1540160781, shard = 0 Insert: hash = 1377131543, shard = 0 Insert: hash = 2125989246, shard = 0 Insert: hash = 326003543, shard = 0 AFAICT increasing the capacity of the cache doesn't impact correctness. Change-Id: I81b70f63923078d449f6541a61b292517e49877d --- M docs/installation.adoc M src/kudu/codegen/codegen-test.cc M src/kudu/gutil/sysinfo.cc 3 files changed, 11 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/6 -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock Noland Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Brock Noland has posted comments on this change. Change subject: [tests] fix test which fails with two cpus and document other dependencies .. Patch Set 5: Makes sense. I thought /proc/sys/kernel/pid_max was higher on 64bit systems than the 32bit default. -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock Noland Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Todd Lipcon has posted comments on this change. Change subject: [tests] fix test which fails with two cpus and document other dependencies .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/4446/5/docs/installation.adoc File docs/installation.adoc: Line 56: - Limits nproc and nofile greater than 32768 > I didn't hit nproc, I hit nofile. However, given the low default settings f Well, looking at a server I'm running which has 406 tablets running, it's using 1036 threads. So, to cross 32k threads you'd probably need 10,000+ tablets, and a lot of other stuff would break first. As an operator I'd probably recommend a lower nproc ulimit than 65536, because if you set it that high, then a runaway loop which is leaking threads in one process could easily use all the pids on a system and prevent other unrelated processes from starting threads (whereas I'd prefer just Kudu to crash) -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock Noland Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1563. Add support for INSERT IGNORE
Brock Noland has posted comments on this change. Change subject: KUDU-1563. Add support for INSERT IGNORE .. Patch Set 7: Tests appear to be unrelated as the later patches had their tests pass. -- To view, visit http://gerrit.cloudera.org:8080/4491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bfc35e9d27bd5e2d3375b68e6e4716ed671f36c Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock Noland Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Brock Noland has posted comments on this change. Change subject: [tests] fix test which fails with two cpus and document other dependencies .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/4446/5/docs/installation.adoc File docs/installation.adoc: Line 56: - Limits nproc and nofile greater than 32768 > which tests fail with nproc <= 32768? If this is just a test requirement, I I didn't hit nproc, I hit nofile. However, given the low default settings for both of these, I was just suggesting we just document a reasonable setting "generally" which is why I placed it here. Happy to move it somewhere else or adjust as needed. BTW, CM sets both limits to: Max processes 6553665536processes Max open files3276832768files for everything running as a child process. As such, it might make sense to ensure the most tested configuration is specified here. -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock Noland Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] cache: fix behavior on single-CPU systems
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4535 to look at the new patch set (#2). Change subject: cache: fix behavior on single-CPU systems .. cache: fix behavior on single-CPU systems On a system with only a single CPU, shard_bits_ would be set to 0. This would then result in calculating 'hash >> (32 - 0)' which is undefined behavior. With optimizations, this would turn into a no-op, and we'd end up using the whole hash as the shard index, instead of 0, causing a crash. The fix is to widen the hash to uint64_t before shifting. Tested by manually making NumCPUs return 1 and running cache-test. Change-Id: I7809e5697df657a589b2ceae5c6d4edbf161b52a --- M src/kudu/util/cache.cc 1 file changed, 3 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/4535/2 -- To view, visit http://gerrit.cloudera.org:8080/4535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7809e5697df657a589b2ceae5c6d4edbf161b52a Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Marcel Kornacker
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Todd Lipcon has posted comments on this change. Change subject: [tests] fix test which fails with two cpus and document other dependencies .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/4446/5/docs/installation.adoc File docs/installation.adoc: Line 56: - Limits nproc and nofile greater than 32768 which tests fail with nproc <= 32768? If this is just a test requirement, I dont think it should be in the installation adoc. Plus I'm surprised we use that many threads in any test (perhaps it's a bug) http://gerrit.cloudera.org:8080/#/c/4446/5/src/kudu/gutil/sysinfo.cc File src/kudu/gutil/sysinfo.cc: Line 63: DEFINE_int32(num_cpus, 0, "Override number of CPUs by setting to a value > 0"); can you flag this as advanced? also perhaps say something like 'Override the auto-detected number of CPUs on this system' since you can't actually override the number of CPUs :) -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock Noland Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4446 to look at the new patch set (#5). Change subject: [tests] fix test which fails with two cpus and document other dependencies .. [tests] fix test which fails with two cpus and document other dependencies Various tests fail due to lack of lsof and low resource limits, which are not documented. CodegenTest.TestCodeCache - fails on my two cpu host, t2.large increasing the cache size resolves this failure. The test is depending on: 1) key skew in the SharedLRUCache to obtain hits 2) the fact that capacity of SharedLRUCache is higher than the capacity configured if the configured capacity is not divisible by the the number of CPUs. For example, the capacity is set here: FLAGS_codegen_cache_capacity = 10; However, if the capacity is not perfectly divisible by the number of CPUs, actual capacity is slightly higher. CPU 2 => Capacity 10, 5/shard CPU 4 => Capacity 12, 3/shard CPU 8 => Capacity 16, 2/shard Due to this calculation: const size_t per_shard = (capacity + (num_shards - 1)) / num_shards; Additionally, the test depends on key skew. For example, I added some temporary logging which logged each insert. Let's look at inserts into shard 0. Under the 4 CPU case, where each shard has a capacity of 3, shard 3 only sees three inserts in pass 0 resulting in hits on the next pass: pass: 0 Insert: hash = 460595995, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 326003543, shard = 0 pass: 1 Under the two CPU case, both shard's see more than 5 inserts, causing no cache hits. pass: 0 Insert: hash = 1886151623, shard = 0 Insert: hash = 1395239506, shard = 0 Insert: hash = 1931154674, shard = 0 Insert: hash = 460595995, shard = 0 Insert: hash = 1440596256, shard = 0 Insert: hash = 1870227699, shard = 0 Insert: hash = 1163308785, shard = 0 Insert: hash = 1980547462, shard = 0 Insert: hash = 1106104592, shard = 0 Insert: hash = 1702846352, shard = 0 Insert: hash = 1230845174, shard = 0 Insert: hash = 1903296752, shard = 0 Insert: hash = 1395526688, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 1540160781, shard = 0 Insert: hash = 1377131543, shard = 0 Insert: hash = 2125989246, shard = 0 Insert: hash = 326003543, shard = 0 pass: 1 Insert: hash = 1886151623, shard = 0 Insert: hash = 1395239506, shard = 0 Insert: hash = 1931154674, shard = 0 Insert: hash = 460595995, shard = 0 Insert: hash = 1440596256, shard = 0 Insert: hash = 1870227699, shard = 0 Insert: hash = 1163308785, shard = 0 Insert: hash = 1980547462, shard = 0 Insert: hash = 1106104592, shard = 0 Insert: hash = 1702846352, shard = 0 Insert: hash = 1230845174, shard = 0 Insert: hash = 1903296752, shard = 0 Insert: hash = 1395526688, shard = 0 Insert: hash = 339190469, shard = 0 Insert: hash = 1540160781, shard = 0 Insert: hash = 1377131543, shard = 0 Insert: hash = 2125989246, shard = 0 Insert: hash = 326003543, shard = 0 AFAICT increasing the capacity of the cache doesn't impact correctness. Change-Id: I81b70f63923078d449f6541a61b292517e49877d --- M docs/installation.adoc M src/kudu/codegen/codegen-test.cc M src/kudu/gutil/sysinfo.cc 3 files changed, 10 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/5 -- To view, visit http://gerrit.cloudera.org:8080/4446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock Noland Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1563. Add support for INSERT IGNORE
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4491 to look at the new patch set (#7). Change subject: KUDU-1563. Add support for INSERT IGNORE .. KUDU-1563. Add support for INSERT IGNORE Add's `INSERT IGNORE' operation which behaves like a normal `INSERT' except in the case when a duplicate row error would be raised by the primary key having been previously inserted. Follows upsert backend/c++ patch 56c431585ed7ad07ef. Change-Id: I5bfc35e9d27bd5e2d3375b68e6e4716ed671f36c --- M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h M src/kudu/common/row_operations-test.cc M src/kudu/common/row_operations.cc M src/kudu/common/row_operations.h M src/kudu/common/wire_protocol.proto M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/tablet/local_tablet_writer.h M src/kudu/tablet/row_op.cc M src/kudu/tablet/row_op.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_metrics.cc M src/kudu/tablet/tablet_metrics.h M src/kudu/tablet/tablet_random_access-test.cc M src/kudu/tablet/transactions/transaction.cc M src/kudu/tablet/transactions/transaction.h M src/kudu/tablet/transactions/write_transaction.cc 24 files changed, 270 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4491/7 -- To view, visit http://gerrit.cloudera.org:8080/4491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5bfc35e9d27bd5e2d3375b68e6e4716ed671f36c Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock Noland Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] [python] KUDU-1563. Add support for INSERT IGNORE
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4522 to look at the new patch set (#4). Change subject: [python] KUDU-1563. Add support for INSERT IGNORE .. [python] KUDU-1563. Add support for INSERT IGNORE Implements python support for the `INSERT IGNORE' operation Change-Id: I6c45a50d4b87d8f7c4f0f83fbc72932d056d3a79 --- M python/kudu/__init__.py M python/kudu/client.pyx M python/kudu/libkudu_client.pxd M python/kudu/tests/test_client.py 4 files changed, 37 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/4522/4 -- To view, visit http://gerrit.cloudera.org:8080/4522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6c45a50d4b87d8f7c4f0f83fbc72932d056d3a79 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock Noland Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-1563. Add support for INSERT IGNORE
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4491 to look at the new patch set (#6). Change subject: KUDU-1563. Add support for INSERT IGNORE .. KUDU-1563. Add support for INSERT IGNORE Add's `INSERT IGNORE' operation which behaves like a normal `INSERT' except in the case when a duplicate row error would be raised by the primary key having been previously inserted. Follows upsert backend/c++ patch 56c431585ed7ad07ef. Change-Id: I5bfc35e9d27bd5e2d3375b68e6e4716ed671f36c --- M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/write_op.cc M src/kudu/client/write_op.h M src/kudu/common/row_operations-test.cc M src/kudu/common/row_operations.cc M src/kudu/common/row_operations.h M src/kudu/common/wire_protocol.proto M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/tablet/local_tablet_writer.h M src/kudu/tablet/row_op.cc M src/kudu/tablet/row_op.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_metrics.cc M src/kudu/tablet/tablet_metrics.h M src/kudu/tablet/tablet_random_access-test.cc M src/kudu/tablet/transactions/transaction.cc M src/kudu/tablet/transactions/transaction.h M src/kudu/tablet/transactions/write_transaction.cc 24 files changed, 270 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/4491/6 -- To view, visit http://gerrit.cloudera.org:8080/4491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5bfc35e9d27bd5e2d3375b68e6e4716ed671f36c Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Brock Noland Gerrit-Reviewer: Brock Noland Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot