[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

2016-09-26 Thread Brock Noland (Code Review)
Brock Noland has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other 
dependencies
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4446/6//COMMIT_MSG
Commit Message:

PS6, Line 17: 2) the fact that capacity of SharedLRUCache is higher than the 
capacity
: configured if the configured capacity is not divisible by the the 
number
: of CPUs.
: 
: For example, the capacity is set here:
: 
:   FLAGS_codegen_cache_capacity = 10;
: 
: However, if the capacity is not perfectly divisible by the number 
of CPUs,
: actual capacity is slightly higher.
: 
: CPU 2 => Capacity 10, 5/shard
: CPU 4 => Capacity 12, 3/shard
: CPU 8 => Capacity 16, 2/shard
: 
: Due to this calculation:
: 
: const size_t per_shard = (capacity + (num_shards - 1)) / 
num_shards;
> I appreciate the detailed explanation here, but I'm having a hard time unde
Shoot, my explanation as subpar. I will update, but in each of those examples 
the configured capacity is 10, but the actual capacity is 10, 12, 16.

More details coming.


-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Brock Noland 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

2016-09-26 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other 
dependencies
..


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4446/6//COMMIT_MSG
Commit Message:

PS6, Line 16: SharedLRUCache
Nit: this should be ShardedLRUCache. Below too.


PS6, Line 17: 2) the fact that capacity of SharedLRUCache is higher than the 
capacity
: configured if the configured capacity is not divisible by the the 
number
: of CPUs.
: 
: For example, the capacity is set here:
: 
:   FLAGS_codegen_cache_capacity = 10;
: 
: However, if the capacity is not perfectly divisible by the number 
of CPUs,
: actual capacity is slightly higher.
: 
: CPU 2 => Capacity 10, 5/shard
: CPU 4 => Capacity 12, 3/shard
: CPU 8 => Capacity 16, 2/shard
: 
: Due to this calculation:
: 
: const size_t per_shard = (capacity + (num_shards - 1)) / 
num_shards;
I appreciate the detailed explanation here, but I'm having a hard time 
understanding it without looking at the code too. Maybe you can work in how 
num_shards is calculated? And make explicit that the actual capacity is 
per_shard * num_shards?

Also, in each of the examples you've provided, the capacity is perfectly 
divisible by the number of CPUs. Maybe add some negative examples?


PS6, Line 47: shard's
Nit: shards


http://gerrit.cloudera.org:8080/#/c/4446/6/docs/installation.adoc
File docs/installation.adoc:

Please update the various script examples in this file, as well as the SLES 
instructions.


http://gerrit.cloudera.org:8080/#/c/4446/6/src/kudu/codegen/codegen-test.cc
File src/kudu/codegen/codegen-test.cc:

Line 373:   FLAGS_codegen_cache_capacity = 20;
This fixes the test for two CPUs, but can we either fix the test to be more 
robust when the capacity isn't exactly 20, or fix the underlying sharding 
implementation to always use the provided capacity (maybe we'd need to enforce 
only certain kinds of capacity values e.g. powers of 2 to do this)?


http://gerrit.cloudera.org:8080/#/c/4446/6/src/kudu/gutil/sysinfo.cc
File src/kudu/gutil/sysinfo.cc:

PS6, Line 64:  "Advanced option. Use at your own risk");
I think what Todd meant by "flag it as advanced" was that you use TAG_FLAG to 
explicitly tag it.

However, tag_flags is part of the util module, which depends on gutil, so doing 
so would yield a circular dependency. One way to address it would be as follows:
1) Add a kudu::NumCPUs() method somewhere in the util module.
2) Make it call base::NumCPUs().
3) Define this new flag and handle the override there.
4) Change all existing callers to use kudu::NumCPUs() instead.


-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Brock Noland 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

2016-09-25 Thread Brock Noland (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4446

to look at the new patch set (#6).

Change subject: [tests] fix test which fails with two cpus and document other 
dependencies
..

[tests] fix test which fails with two cpus and document other dependencies

Various tests fail due to lack of lsof and low resource
limits, which are not documented.

CodegenTest.TestCodeCache - fails on my two cpu host, t2.large
increasing the cache size resolves this failure. The test is
depending on:

1) key skew in the SharedLRUCache to obtain hits
2) the fact that capacity of SharedLRUCache is higher than the capacity
configured if the configured capacity is not divisible by the the number
of CPUs.

For example, the capacity is set here:

  FLAGS_codegen_cache_capacity = 10;

However, if the capacity is not perfectly divisible by the number of CPUs,
actual capacity is slightly higher.

CPU 2 => Capacity 10, 5/shard
CPU 4 => Capacity 12, 3/shard
CPU 8 => Capacity 16, 2/shard

Due to this calculation:

const size_t per_shard = (capacity + (num_shards - 1)) / num_shards;

Additionally, the test depends on key skew. For example, I added some temporary
logging which logged each insert. Let's look at inserts into shard 0. Under the
4 CPU case, where each shard has a capacity of 3, shard 3 only sees three 
inserts
in pass 0 resulting in hits on the next pass:

pass: 0
Insert: hash = 460595995, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1

Under the two CPU case, both shard's see more than 5 inserts, causing no cache 
hits.

pass: 0
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0

AFAICT increasing the capacity of the cache doesn't impact correctness.

Change-Id: I81b70f63923078d449f6541a61b292517e49877d
---
M docs/installation.adoc
M src/kudu/codegen/codegen-test.cc
M src/kudu/gutil/sysinfo.cc
3 files changed, 11 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Brock Noland 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

2016-09-25 Thread Brock Noland (Code Review)
Brock Noland has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other 
dependencies
..


Patch Set 5:

Makes sense. I thought /proc/sys/kernel/pid_max was higher on 64bit systems 
than the 32bit default.

-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Brock Noland 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

2016-09-25 Thread Brock Noland (Code Review)
Brock Noland has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other 
dependencies
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4446/5/docs/installation.adoc
File docs/installation.adoc:

Line 56: - Limits nproc and nofile greater than 32768
> which tests fail with nproc <= 32768? If this is just a test requirement, I
I didn't hit nproc, I hit nofile. However, given the low default settings for 
both of these, I was just suggesting we just document a reasonable setting 
"generally" which is why I placed it here. Happy to move it somewhere else or 
adjust as needed.

BTW, CM sets both limits to:

Max processes 6553665536processes 
Max open files3276832768files  

for everything running as a child process. As such, it might make sense to 
ensure the most tested configuration is specified here.


-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Brock Noland 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

2016-09-25 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: [tests] fix test which fails with two cpus and document other 
dependencies
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4446/5/docs/installation.adoc
File docs/installation.adoc:

Line 56: - Limits nproc and nofile greater than 32768
which tests fail with nproc <= 32768? If this is just a test requirement, I 
dont think it should be in the installation adoc. Plus I'm surprised we use 
that many threads in any test (perhaps it's a bug)


http://gerrit.cloudera.org:8080/#/c/4446/5/src/kudu/gutil/sysinfo.cc
File src/kudu/gutil/sysinfo.cc:

Line 63: DEFINE_int32(num_cpus, 0, "Override number of CPUs by setting to a 
value > 0");
can you flag this as advanced? also perhaps say something like 'Override the 
auto-detected number of CPUs on this system' since you can't actually override 
the number of CPUs :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Brock Noland 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

2016-09-25 Thread Brock Noland (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4446

to look at the new patch set (#5).

Change subject: [tests] fix test which fails with two cpus and document other 
dependencies
..

[tests] fix test which fails with two cpus and document other dependencies

Various tests fail due to lack of lsof and low resource
limits, which are not documented.

CodegenTest.TestCodeCache - fails on my two cpu host, t2.large
increasing the cache size resolves this failure. The test is
depending on:

1) key skew in the SharedLRUCache to obtain hits
2) the fact that capacity of SharedLRUCache is higher than the capacity
configured if the configured capacity is not divisible by the the number
of CPUs.

For example, the capacity is set here:

  FLAGS_codegen_cache_capacity = 10;

However, if the capacity is not perfectly divisible by the number of CPUs,
actual capacity is slightly higher.

CPU 2 => Capacity 10, 5/shard
CPU 4 => Capacity 12, 3/shard
CPU 8 => Capacity 16, 2/shard

Due to this calculation:

const size_t per_shard = (capacity + (num_shards - 1)) / num_shards;

Additionally, the test depends on key skew. For example, I added some temporary
logging which logged each insert. Let's look at inserts into shard 0. Under the
4 CPU case, where each shard has a capacity of 3, shard 3 only sees three 
inserts
in pass 0 resulting in hits on the next pass:

pass: 0
Insert: hash = 460595995, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1

Under the two CPU case, both shard's see more than 5 inserts, causing no cache 
hits.

pass: 0
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0

AFAICT increasing the capacity of the cache doesn't impact correctness.

Change-Id: I81b70f63923078d449f6541a61b292517e49877d
---
M docs/installation.adoc
M src/kudu/codegen/codegen-test.cc
M src/kudu/gutil/sysinfo.cc
3 files changed, 10 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Brock Noland 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] [tests] fix test which fails with two cpus and document other dependencies

2016-09-24 Thread Brock Noland (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4446

to look at the new patch set (#4).

Change subject: [tests] fix test which fails with two cpus and document other 
dependencies
..

[tests] fix test which fails with two cpus and document other dependencies

Various tests fail due to lack of lsof and low resource
limits, which are not documented.

CodegenTest.TestCodeCache - fails on my two cpu host, t2.large
increasing the cache size resolves this failure. The test is
depending on:

1) key skew in the SharedLRUCache to obtain hits
2) the fact that capacity of SharedLRUCache is higher than the capacity
configured if the configured capacity is not divisible by the the number
of CPUs.

For example, the capacity is set here:

  FLAGS_codegen_cache_capacity = 10;

However, if the capacity is not perfectly divisible by the number of CPUs,
actual capacity is slightly higher.

CPU 2 => Capacity 10, 5/shard
CPU 4 => Capacity 12, 3/shard
CPU 8 => Capacity 16, 2/shard

Due to this calculation:

const size_t per_shard = (capacity + (num_shards - 1)) / num_shards;

Additionally, the test depends on key skew. For example, I added some temporary
logging which logged each insert. Let's look at inserts into shard 0. Under the
4 CPU case, where each shard has a capacity of 3, shard 3 only sees three 
inserts
in pass 0 resulting in hits on the next pass:

pass: 0
Insert: hash = 460595995, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1

Under the two CPU case, both shard's see more than 5 inserts, causing no cache 
hits.

pass: 0
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0
pass: 1
Insert: hash = 1886151623, shard = 0
Insert: hash = 1395239506, shard = 0
Insert: hash = 1931154674, shard = 0
Insert: hash = 460595995, shard = 0
Insert: hash = 1440596256, shard = 0
Insert: hash = 1870227699, shard = 0
Insert: hash = 1163308785, shard = 0
Insert: hash = 1980547462, shard = 0
Insert: hash = 1106104592, shard = 0
Insert: hash = 1702846352, shard = 0
Insert: hash = 1230845174, shard = 0
Insert: hash = 1903296752, shard = 0
Insert: hash = 1395526688, shard = 0
Insert: hash = 339190469, shard = 0
Insert: hash = 1540160781, shard = 0
Insert: hash = 1377131543, shard = 0
Insert: hash = 2125989246, shard = 0
Insert: hash = 326003543, shard = 0

AFAICT increasing the capacity of the cache doesn't impact correctness.

Change-Id: I81b70f63923078d449f6541a61b292517e49877d
---
M docs/installation.adoc
M src/kudu/codegen/codegen-test.cc
M src/kudu/gutil/sysinfo.cc
3 files changed, 10 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/4446/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81b70f63923078d449f6541a61b292517e49877d
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Brock Noland 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Brock Noland 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley