Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9085 )

Change subject: IMPALA-6070: Expose using Docker to run tests faster.
......................................................................


Patch Set 3:

(39 comments)

http://gerrit.cloudera.org:8080/#/c/9085/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9085/1//COMMIT_MSG@28
PS1, Line 28: Besides the obvious win of getting test results more quickly, this
> They do. Until you do a "docker commit" and re-start the container with the
For my information: why does that change things?


http://gerrit.cloudera.org:8080/#/c/9085/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9085/3//COMMIT_MSG@41
PS3, Line 41: as
nit: "and"?


http://gerrit.cloudera.org:8080/#/c/9085/3//COMMIT_MSG@48
PS3, Line 48: IMPALA-6494
I think this is probably a blocker for me.


http://gerrit.cloudera.org:8080/#/c/9085/3/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/9085/3/bin/start-impala-cluster.py@221
PS3, Line 221:   system_memory = psutil.virtual_memory().total
What's the rationale for this change from ps1?


http://gerrit.cloudera.org:8080/#/c/9085/1/docker/annotate.pl
File docker/annotate.pl:

http://gerrit.cloudera.org:8080/#/c/9085/1/docker/annotate.pl@26
PS1, Line 26: }
> Perl's already around in any Debian system, so it's not really adding a new
Yes, I'm not thrilled by the introduction of another PL.

    curl -s 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/lastSuccessfulBuild/consoleText
 | wc -l

is less than 10,000 lines for me, so the performance penalty is low.


http://gerrit.cloudera.org:8080/#/c/9085/1/docker/entrypoint.sh
File docker/entrypoint.sh:

http://gerrit.cloudera.org:8080/#/c/9085/1/docker/entrypoint.sh@20
PS1, Line 20: # Entrypoint code for test-with-docker.py containers. 
test-with-docker.py
> Elaborated a little bit. Pointed to top of test-with-docker.py.
Still, when I cd into a directory, it's nice to know which file to look at 
first. Maybe a README that says "look at the top of test-with-docker.py"?


http://gerrit.cloudera.org:8080/#/c/9085/1/docker/entrypoint.sh@79
PS1, Line 79:
> The only real intersection is "service postgresql start" and "service ssh s
I think there is more overlap. The hosts munging is in both places, for 
instance, and the other things you write are useful additions to that file.


http://gerrit.cloudera.org:8080/#/c/9085/1/docker/entrypoint.sh@142
PS1, Line 142: $x-orig
> I ran into this the hard way and largely fixed it (see pointer below).
I'd suggest changing GVD back for some time while you and I work through this 
code review. That'll give -build_shared_libs some "burn-in" time.


http://gerrit.cloudera.org:8080/#/c/9085/1/docker/entrypoint.sh@150
PS1, Line 150:     if sudo -u postgres psql -c "select 1"; then
> Done.
This variant of exec always returns true. The other version of exec can build 
command lines too long for rm to use. You could try

    find be -name '*.o' -print0 | xargs -0 -L 42 rm --


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/entrypoint.sh
File docker/entrypoint.sh:

http://gerrit.cloudera.org:8080/#/c/9085/3/docker/entrypoint.sh@33
PS3, Line 33: <name>
nits: what is "<name>"? How is it used - just an empty directory?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/entrypoint.sh@75
PS3, Line 75: busily
nit: Can you leave a note explaining why we want to sleep busily, rather than 
idly?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/entrypoint.sh@178
PS3, Line 178:   cd /home/impdev/Impala
pushd, the popd trap on function return or exit?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/entrypoint.sh@245
PS3, Line 245: IMPALAD_MEM_LIMIT_BYTES
Can you add a comment about where this variable comes from?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/entrypoint.sh@249
PS3, Line 249:     # TODO: session-expiry-test is inconsistent when there isn't
Can you file a ticket along with the error message you get?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py
File docker/monitor.py:

http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@20
PS3, Line 20: # Monitors Docker containers for CPU and memory usage.
Can you add some example invocations?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@21
PS3, Line 21: # Prepares a timeline based on said monitoring.
What format is that timeline in? What file is it in?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@61
PS3, Line 61:   if total_gb and available_gb:
What if available_gb is 0.0? Can that happen on a machine with swap?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@96
PS3, Line 96:     t = line[:ts_length]
nit: this and the next section can be de-duped somewhat.


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@122
PS3, Line 122: usercpu
What is usercpu? I mean, what are the units?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@123
PS3, Line 123: memory.stat
Where is memory.stat found?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@131
PS3, Line 131:     frequency_seconds is how often metrics are gathered
nit: all on one line


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@150
PS3, Line 150: which an
nit: I think you accidentally a word


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@154
PS3, Line 154:   def _metrics_from_stat_file(root, container, stat):
Generally, it's helpful to know the type (and format, for strings) of the 
parameters and returned value, as well as some semantic information about the 
parameters, like "this is a string and it should be a filename that does not 
yet exist" (or whatever)


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@156
PS3, Line 156: os.path.join(root, "docker", container.id_)
nit: repeated below; factor out


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@163
PS3, Line 163: disappear
That should throw a specific exception, yes? Can you catch just that?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@170
PS3, Line 170: a log file
in particular, the one set in the constructor.


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@176
PS3, Line 176: ["findmnt", "-n", "-o", "TARGET", "-t", "cgroup", "--source", 
"cgroup"]
nit: "findmnt -n -o TARGET -t cgroup --source cgroup".split()


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@221
PS3, Line 221:     Returns a tuple of (name, timestamp, line) for interesting 
lines in the file
nit: this and a couple other doc strings don't quite meet our usual pattern, 
PEP-257.


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@221
PS3, Line 221: Returns a tuple
It doesn't return a tuple, right -- it returns a list of tuples?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@226
PS3, Line 226: parsed_interesting_lines
nit: why name it -- just return it.


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@230
PS3, Line 230: self
self unused; Can be static method?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@270
PS3, Line 270: logfile_timeline
This is called only once, it seems. Can we call it with just C and have 
logfile_timeline get the name and the logfile?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@292
PS3, Line 292: mterics_by_container
nit: spelling


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/monitor.py@316
PS3, Line 316:       o.write(
nit: is there a cleaner way to copy file contents?


http://gerrit.cloudera.org:8080/#/c/9085/1/docker/test-with-docker.py
File docker/test-with-docker.py:

http://gerrit.cloudera.org:8080/#/c/9085/1/docker/test-with-docker.py@116
PS1, Line 116: import multiprocessing
> Looks like free has added an available column! That was news to me.
free_lines = subprocess.check_output(["free", "-b", "-w"]).split('\n')
    free_grid = map(lambda x: x.split(), free_lines)
    total_idx = free_grid[0].index("total")
    available_idx = free_grid[0].index("available")
    int(free_grid[1][1 + total_idx]) - int(free_grid[1][1 + available_idx])

I'd imagine this also is less finnicky with older kernels.


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/timeline.html.template
File docker/timeline.html.template:

http://gerrit.cloudera.org:8080/#/c/9085/3/docker/timeline.html.template@19
PS3, Line 19: Runs tests inside of docker containers, parallelizing different 
types of
            : tests
That's not what this file actually does, right?


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/timeline.html.template@53
PS3, Line 53: 0, 0, 0, 0, 0, 0,
What are these for? Maybe add a comment like /* days */ to some parameters. 
Also, please dedup with next expression.


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/timeline.html.template@53
PS3, Line 53:     dataTable.addRow([ row[0], row[1], new Date(0, 0, 0, 0, 0, 0, 
1000 * row[2]), new Date(0, 0, 0, 0, 0, 0, 1000*row[3]) ]);
nit: long line


http://gerrit.cloudera.org:8080/#/c/9085/3/docker/timeline.html.template@72
PS3, Line 72:       //  width: "100px"
Elide



--
To view, visit http://gerrit.cloudera.org:8080/9085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I82052ef31979564968effef13a3c6af0d5c62767
Gerrit-Change-Number: 9085
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Comment-Date: Mon, 26 Feb 2018 01:52:01 +0000
Gerrit-HasComments: Yes

Reply via email to