Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18738 )

Change subject: IMPALA-9442: Add Ozone to minicluster
......................................................................


Patch Set 12:

(6 comments)

This is looking really good

http://gerrit.cloudera.org:8080/#/c/18738/12/java/executor-deps/pom.xml
File java/executor-deps/pom.xml:

http://gerrit.cloudera.org:8080/#/c/18738/12/java/executor-deps/pom.xml@214
PS12, Line 214:       <groupId>org.apache.ozone</groupId>
              :       <artifactId>ozone-filesystem-hadoop3</artifactId>
              :       <version>${ozone.version}</version>
This difference in the groupId/artifactId will be a problem for the 
USE_APACHE_OZONE=false configuration.


http://gerrit.cloudera.org:8080/#/c/18738/12/testdata/cluster/admin
File testdata/cluster/admin:

http://gerrit.cloudera.org:8080/#/c/18738/12/testdata/cluster/admin@472
PS12, Line 472:   rm -rf "$IMPALA_CLUSTER_NODES_DIR/$NODE_PREFIX"*/data/ozone
> Should this be more granular to avoid cleaning HDFS when using Ozone (and O
This is ok. If we find we need it, we can add it later.


http://gerrit.cloudera.org:8080/#/c/18738/12/testdata/cluster/node_templates/common/etc/hadoop/conf/ozone-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/ozone-site.xml.tmpl:

http://gerrit.cloudera.org:8080/#/c/18738/12/testdata/cluster/node_templates/common/etc/hadoop/conf/ozone-site.xml.tmpl@1
PS12, Line 1: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
One overarching comment: We added the python-based configurations to be a 
replacement for the .tmpl files, so I think we should use that for new 
configuration files. See core-site.xml.py.


http://gerrit.cloudera.org:8080/#/c/18738/12/testdata/cluster/node_templates/common/etc/hadoop/conf/ozone-site.xml.tmpl@23
PS12, Line 23:   <property>
             :     <name>ozone.scm.block.client.address</name>
             :     <value>${INTERNAL_LISTEN_HOST}</value>
             :   </property>
             :   <property>
             :     <name>ozone.scm.client.address</name>
             :     <value>${INTERNAL_LISTEN_HOST}</value>
             :   </property>
             :   <property>
             :     <name>ozone.scm.names</name>
             :     <value>${INTERNAL_LISTEN_HOST}</value>
             :   </property>
             :   <property>
             :     <name>ozone.om.address</name>
             :     <value>${INTERNAL_LISTEN_HOST}</value>
             :   </property>
In our upstream dockerized tests, Impala is running as docker containers inside 
a docker network, while the minicluster runs outside the docker network. If a 
minicluster component needs to be accessible from inside the docker network 
(i.e. from Impala), then it needs to bind to EXTERNAL_LISTEN_HOST rather than 
INTERNAL_LISTEN_HOST.

It's ok to defer this for Ozone, because we haven't planned on running that 
test form factor yet. But if you already know the answer, then we can go ahead 
and incorporate that.


http://gerrit.cloudera.org:8080/#/c/18738/12/testdata/cluster/node_templates/common/etc/init.d/ozone-common
File testdata/cluster/node_templates/common/etc/init.d/ozone-common:

http://gerrit.cloudera.org:8080/#/c/18738/12/testdata/cluster/node_templates/common/etc/init.d/ozone-common@19
PS12, Line 19: export OZONE_OPTS="-XX:ParallelGCThreads=8 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 
-XX:+CMSParallelRemarkEnabled"
Do you know if the Ozone scripts limit the JVM consumption for Ozone services? 
For example, we use xmx to limit JVM memory for various services, like Hive 
here:
https://github.com/apache/impala/blob/master/testdata/bin/run-hive-server.sh#L133


http://gerrit.cloudera.org:8080/#/c/18738/12/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/18738/12/tests/metadata/test_ddl.py@75
PS12, Line 75:     assert self.filesystem_client.exists(
             :       
trash_path('test-warehouse/{0}.db/t1/t1.txt'.format(unique_database)))
             :     assert self.filesystem_client.exists(
             :       
trash_path('test-warehouse/{0}.db/t1'.format(unique_database)))
Nit: Since the path inside the trash is the same, I think I would prefer not to 
pass it into the trash_path() function. i.e.

assert self.filesystem_client.exists(
  '{0}/test-warehouse/{1}.db/t1/t1.txt'.format(trash_path(), unique_database))



--
To view, visit http://gerrit.cloudera.org:8080/18738
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be
Gerrit-Change-Number: 18738
Gerrit-PatchSet: 12
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Tue, 26 Jul 2022 23:45:46 +0000
Gerrit-HasComments: Yes

Reply via email to