[
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482789#comment-14482789
]
Varun Vasudev commented on YARN-3366:
-------------------------------------
Thanks for the patch [~sidharta-s]! Feedback below.
# In YarnConfiguration.java
{noformat}
/**
- * True if linux-container-executor should limit itself to one user
+ * If linux-container-executor should limit itself to one user
* when running in non-secure mode.
*/
- public static final String NM_NONSECURE_MODE_LIMIT_USERS = NM_PREFIX +
+ public static final String NM_NONSECURE_MODE_LIMIT_USERS= NM_PREFIX +
"linux-container-executor.nonsecure-mode.limit-users";
- public static final boolean DEFAULT_NM_NONSECURE_MODE_LIMIT_USERS = true;
+ public static final boolean DEFAULT_NM_NONSECURE_MODE_LIMIT_USERS = true;
{noformat}
It looks like these are unnecessary changes. Can you please remove them?
# In TrafficController.java
{noformat}
if (LOG.isInfoEnabled()) {
LOG.info("NM recovery is not enabled.");
}
{noformat}
{noformat}
if (LOG.isInfoEnabled()) {
LOG.info("TC configuration is incomplete.");
}
{noformat}
Can you change these to debug? It doesn't seem to be something that needs to be
logged by the class.
# In TrafficController.java
{noformat}
else {
if (LOG.isWarnEnabled()) {
String logLine = new StringBuffer("Failed to match regex: ")
.append(regex).append(" Current state:
").append(state).toString();
LOG.warn(logLine);
return false;
}
}
{noformat}
Shouldn't the return be outside the warn enabled check?
# In TrafficController.java
{noformat}
//This could happen if the interface is already in its default state.
//Ignoring.
//throw new ResourceHandlerException("Failed to wipe tc state", e);
{noformat}
The comments are in a different block than the warn message. Also, the
commented throw is confusing.
# Minor nit - In TrafficController.java, function parseStatsString, the
continue isn't really required
# In TrafficControlBandwidthHandlerImpl.java - Unused import import
com.google.common.annotations.VisibleForTesting
# In TrafficControlBandwidthHandlerImpl.java
{noformat}
LOG.info("strict mode is set to : " + strictMode);
{noformat}
{noformat}
LOG.info("Attempting to reacquire classId for container: " +
containerIdStr);
{noformat}
Change levels to debug?
# In TrafficControlBandwidthHandlerImpl.java
{noformat}
String opArg = new StringBuffer(PrivilegedOperation.CGROUP_ARG_PREFIX)
.append(tasksFile).toString();
{noformat}
You can use the String class itself instead of StringBuffer?
# In TrafficControlBandwidthHandlerImpl.java
{noformat}
if (LOG.isWarnEnabled()) {
LOG.warn("teardown(): Nothing to do");
}
{noformat}
Why are you logging a warning?
# In TestTrafficControlBandwidthHandlerImpl.java and TestTrafficController.java
{noformat}
Assert.assertTrue("Caught unexpected ResourceHandlerException!", false);
{noformat}
User Assert.fail? This pattern is used in multiple places.
# In LinuxContainerExecutor.java.java
{noformat}
} catch (ResourceHandlerException e) {
+ if (LOG.isWarnEnabled()) {
+ LOG.warn("ResourceHandlerChain.reacquireContainer failed for " +
+ "containerId: " + containerId);
+ }
{noformat}
Can you add the exception to the warn message?
# In LinuxContainerExecutor.java
{noformat}
} catch (ResourceHandlerException e) {
if (LOG.isWarnEnabled()) {
LOG.warn(e);
LOG.warn("ResourceHandlerChain.postComplete failed for " +
"containerId: " + containerId);
}
}
{noformat}
Merge the warn messages.
# In LinuxContainerExecutor.java
{noformat}
+ command.addAll(Arrays.asList(containerExecutorExe,
{noformat}
Remove the extra space added.
# In LinuxContainerExecutor.java
{noformat}
+ String tcCommandFile = null;
+
+ try {
+ if (resourceHandlerChain != null) {
+ List<PrivilegedOperation> ops = resourceHandlerChain
+ .preStart(container);
+
+ if (ops != null) {
+ List<PrivilegedOperation> resourceOps = new ArrayList<>();
+
+ resourceOps.add(new PrivilegedOperation
+ (PrivilegedOperation.OperationType.ADD_PID_TO_CGROUP,
+ resourcesOptions));
+
+ for (PrivilegedOperation op : ops) {
+ switch (op.getOperationType()) {
+ case ADD_PID_TO_CGROUP:
+ resourceOps.add(op);
+ break;
+ case TC_MODIFY_STATE:
+ tcCommandFile = op.getArguments().get(0);
+ default:
+ if (LOG.isWarnEnabled()) {
+ LOG.warn("PrivilegedOperation type unsupported in launch: "
+ + op.getOperationType());
+ }
+ continue;
+ }
+ }
+
+ if (resourceOps.size() > 1) {
+ //squash resource operations
+ try {
+ PrivilegedOperation operation = PrivilegedOperationExecutor
+ .squashCGroupOperations(resourceOps);
+ resourcesOptions = operation.getArguments().get(0);
+ } catch (PrivilegedOperationException e) {
+ if (LOG.isErrorEnabled()) {
+ LOG.error("Failed to squash cgroup operations!", e);
+ }
+
+ throw new ResourceHandlerException("Failed to squash cgroup
operations!");
+ }
+ }
+ }
+ }
+ } catch (ResourceHandlerException e) {
+ if (LOG.isErrorEnabled()) {
+ LOG.error("ResourceHandlerChain.preStart() failed!", e);
+ }
+
+ throw new IOException("ResourceHandlerChain.preStart() failed!");
+ }
{noformat}
Can this block be made a method?
> Outbound network bandwidth : classify/shape traffic originating from YARN
> containers
> ------------------------------------------------------------------------------------
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Sidharta Seethana
> Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch, YARN-3366.002.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth
> limits, we need a mechanism to classify/shape network traffic in the
> nodemanager. For more information on the design, please see the attached
> design document in the parent JIRA.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)