[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1200


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-04 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179292376
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
+  echo $dbitPid > /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs
+  # Verify Enforcement
+  cgroupStatus=`grep -w $pid 
/cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs`
+  if [ -z "$cgroupStatus" ]; then
+#Ref: 
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
+let cpu_quota=`cat /cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_quota_us`
+let cpu_period=`cat 
/cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_period_us`
+if [ $cpu_period -gt 0 ] && [ $cpu_quota -gt 0 ]; then
+  coresAllowed="(up to "`echo $(( 100 * $cpu_quota / $cpu_period 
)) | sed 's/..$/.&/'`" cores allowed)"
+fi
+echo "WARN: Drillbit's CPU resource usage will be managed under 
the CGroup : $DRILLBIT_CGROUP "$coresAllowed
+  else
+echo "ERROR: Failed to add Drillbit to CGroup ( $DRILLBIT_CGROUP ) 
for resource usage management.  Ensure that the cgroup manages CPU"
+exit 1
+  fi
+else
+  echo "ERROR: cgroup $DRILLBIT_CGROUP does not found. Ensure that 
daemon is running and cgroup exists"
+  exit 1
--- End diff --

Good point. Ideally, we should prevent the Drillbit from starting up (or in 
this case, it should shut down), if CGroups couldn't be applied. My concern is 
that if CGroups is not being enforced, we're running a process that can 
(potentially) consume excess CPU resources. 
Should I shut down the Drillbit in such a scenario, or move on with just a 
WARN message?


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-04 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179287166
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
+  echo $dbitPid > /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs
+  # Verify Enforcement
+  cgroupStatus=`grep -w $pid 
/cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs`
+  if [ -z "$cgroupStatus" ]; then
+#Ref: 
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
+let cpu_quota=`cat /cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_quota_us`
+let cpu_period=`cat 
/cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_period_us`
+if [ $cpu_period -gt 0 ] && [ $cpu_quota -gt 0 ]; then
+  coresAllowed="(up to "`echo $(( 100 * $cpu_quota / $cpu_period 
)) | sed 's/..$/.&/'`" cores allowed)"
+fi
+echo "WARN: Drillbit's CPU resource usage will be managed under 
the CGroup : $DRILLBIT_CGROUP "$coresAllowed
--- End diff --

Thought that since the Drill process is being brought up in a restricted 
mode, it should appear as a WARN.
I can switch it to an INFO because there is nothing compelling to use WARN.
Will make the change on the variable within quotes.


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-04 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179286598
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
+  echo $dbitPid > /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs
+  # Verify Enforcement
+  cgroupStatus=`grep -w $pid 
/cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs`
+  if [ -z "$cgroupStatus" ]; then
+#Ref: 
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
+let cpu_quota=`cat /cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_quota_us`
--- End diff --

Ok. Will fix this 


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-04 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179159323
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -154,6 +192,7 @@ start_bit ( )
   nohup nice -n $DRILL_NICENESS "$DRILL_HOME/bin/runbit" exec ${args[@]} 
>> "$logout" 2>&1 &
   echo $! > $pid
   sleep 1
+  check_after_start
--- End diff --

😄 
Will fix this, thanks!


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-04 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179158686
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
--- End diff --

Thanks for catching that!


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-04 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179158585
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
--- End diff --

Agreed. I'll incorporate this change.


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-04 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179158191
  
--- Diff: distribution/src/resources/drill-env.sh ---
@@ -86,6 +86,12 @@
 
 #export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME}
 
+# CGroup to which the Drillbit belong when running as a daemon using
--- End diff --

:)
I did speculate about enforcement for other resources, but it seems that 
CPU is the primary resource that needs to be managed. Memory 'management' (by 
YARN, for e.g.) otherwise works without the need for CGroup.


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-04 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179156839
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
--- End diff --

I've looked to check for older versions, but wasn't sure which one aligned 
to which version and OS. It's easy to check if CGroups is running, but I don't 
see a guaranteed way of confirming specifically about that CGroup. 
Let me do some more research on this front.


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179026313
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
--- End diff --

Remove commented out line


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179027582
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -154,6 +192,7 @@ start_bit ( )
   nohup nice -n $DRILL_NICENESS "$DRILL_HOME/bin/runbit" exec ${args[@]} 
>> "$logout" 2>&1 &
   echo $! > $pid
   sleep 1
+  check_after_start
--- End diff --

To make things easier:

```
procId=$!
echo $procId > $pid # Yeah, $pid is a file, $procId is the pid...
sleep 1
check_after_start $procId
```

Also, we now have to naming conventions: `waitForProcessEnd` and 
`check_after_start`. Doesn't match which we choose, but we should stick with 
it. Where is check style for scripts?


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179025617
  
--- Diff: distribution/src/resources/drill-env.sh ---
@@ -86,6 +86,12 @@
 
 #export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME}
 
+# CGroup to which the Drillbit belong when running as a daemon using
--- End diff --

Also, maybe provide a bit more of a description. My guess is that the 
CGroup must already be set up? That we only enforce CPU? Or, is what we enforce 
determined entirely by what the user has configured? If so, wouldn't "drill" be 
a better CGroup name?


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179028127
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
+  echo $dbitPid > /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs
+  # Verify Enforcement
+  cgroupStatus=`grep -w $pid 
/cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs`
+  if [ -z "$cgroupStatus" ]; then
+#Ref: 
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
+let cpu_quota=`cat /cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_quota_us`
+let cpu_period=`cat 
/cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_period_us`
+if [ $cpu_period -gt 0 ] && [ $cpu_quota -gt 0 ]; then
+  coresAllowed="(up to "`echo $(( 100 * $cpu_quota / $cpu_period 
)) | sed 's/..$/.&/'`" cores allowed)"
+fi
+echo "WARN: Drillbit's CPU resource usage will be managed under 
the CGroup : $DRILLBIT_CGROUP "$coresAllowed
+  else
+echo "ERROR: Failed to add Drillbit to CGroup ( $DRILLBIT_CGROUP ) 
for resource usage management.  Ensure that the cgroup manages CPU"
+exit 1
+  fi
+else
+  echo "ERROR: cgroup $DRILLBIT_CGROUP does not found. Ensure that 
daemon is running and cgroup exists"
+  exit 1
--- End diff --

Not sure we want to fail the script in this case. Failure means that the 
Drillbit did not start, but it did.


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179028625
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
+  echo $dbitPid > /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs
+  # Verify Enforcement
+  cgroupStatus=`grep -w $pid 
/cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs`
+  if [ -z "$cgroupStatus" ]; then
+#Ref: 
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
+let cpu_quota=`cat /cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_quota_us`
--- End diff --

`let` is used only for math. Just say `cpu_quota=cat...`


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179027761
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
+  echo $dbitPid > /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs
+  # Verify Enforcement
+  cgroupStatus=`grep -w $pid 
/cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs`
+  if [ -z "$cgroupStatus" ]; then
+#Ref: 
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
+let cpu_quota=`cat /cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_quota_us`
+let cpu_period=`cat 
/cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_period_us`
+if [ $cpu_period -gt 0 ] && [ $cpu_quota -gt 0 ]; then
+  coresAllowed="(up to "`echo $(( 100 * $cpu_quota / $cpu_period 
)) | sed 's/..$/.&/'`" cores allowed)"
+fi
+echo "WARN: Drillbit's CPU resource usage will be managed under 
the CGroup : $DRILLBIT_CGROUP "$coresAllowed
--- End diff --

Why is this a WARN? Isn't this exactly what I requested?

Maybe emit this as a message (no WARN) if a -v (verbose) flag is set.

Also, put the variable inside the quotes; bash is handy that way...

Message will be "...CGroup : drillcpu 5". Maybe, "CGroup drillcpu will 
limit Drill to 5 cpu(s)".


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179028078
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
+  echo $dbitPid > /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs
+  # Verify Enforcement
+  cgroupStatus=`grep -w $pid 
/cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs`
+  if [ -z "$cgroupStatus" ]; then
+#Ref: 
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
+let cpu_quota=`cat /cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_quota_us`
+let cpu_period=`cat 
/cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_period_us`
+if [ $cpu_period -gt 0 ] && [ $cpu_quota -gt 0 ]; then
+  coresAllowed="(up to "`echo $(( 100 * $cpu_quota / $cpu_period 
)) | sed 's/..$/.&/'`" cores allowed)"
+fi
+echo "WARN: Drillbit's CPU resource usage will be managed under 
the CGroup : $DRILLBIT_CGROUP "$coresAllowed
+  else
+echo "ERROR: Failed to add Drillbit to CGroup ( $DRILLBIT_CGROUP ) 
for resource usage management.  Ensure that the cgroup manages CPU"
+exit 1
+  fi
+else
+  echo "ERROR: cgroup $DRILLBIT_CGROUP does not found. Ensure that 
daemon is running and cgroup exists"
--- End diff --

cgroup --> CGroup (or at least make the format consistent across messages)

does not found --> not found

We can check if the daemon is running using `kill -0 $pid`. If so, then we 
just tell the user something line "Check CGroup configuration."

Also, errors should go to stdout: `echo "msg" >&2`


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179026462
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
--- End diff --

I believe cgroup setup is OS-specific. There are, AFAIK, multiple group 
versions.

For this reason, I wonder if this needs to have checks. Are there scripts 
we can crib from somewhere that do those checks?


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179025777
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
--- End diff --

If you get this far, the pid must exist or the scripts are broken.

The only question is whether the process is still running. The chance that 
it is not is very low. Also, there is a race condition: the process may exit 
just after we check it. So, not sure it is even worth doing the check.

Finally, the standard way to check for the process is `kill -0 $pid`". See 
`waitForProcessEnd()`.


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179028377
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
+  echo $dbitPid > /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs
+  # Verify Enforcement
+  cgroupStatus=`grep -w $pid 
/cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs`
+  if [ -z "$cgroupStatus" ]; then
+#Ref: 
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
+let cpu_quota=`cat /cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_quota_us`
+let cpu_period=`cat 
/cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_period_us`
+if [ $cpu_period -gt 0 ] && [ $cpu_quota -gt 0 ]; then
+  coresAllowed="(up to "`echo $(( 100 * $cpu_quota / $cpu_period 
)) | sed 's/..$/.&/'`" cores allowed)"
+fi
+echo "WARN: Drillbit's CPU resource usage will be managed under 
the CGroup : $DRILLBIT_CGROUP "$coresAllowed
+  else
+echo "ERROR: Failed to add Drillbit to CGroup ( $DRILLBIT_CGROUP ) 
for resource usage management.  Ensure that the cgroup manages CPU"
+exit 1
--- End diff --

See below about `exit 1`


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179025400
  
--- Diff: distribution/src/resources/drill-env.sh ---
@@ -86,6 +86,12 @@
 
 #export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME}
 
+# CGroup to which the Drillbit belong when running as a daemon using
--- End diff --

belong --> belongs


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179027910
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
+  echo $dbitPid > /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs
+  # Verify Enforcement
+  cgroupStatus=`grep -w $pid 
/cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs`
+  if [ -z "$cgroupStatus" ]; then
+#Ref: 
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
+let cpu_quota=`cat /cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_quota_us`
+let cpu_period=`cat 
/cgroup/cpu/${DRILLBIT_CGROUP}/cpu.cfs_period_us`
+if [ $cpu_period -gt 0 ] && [ $cpu_quota -gt 0 ]; then
+  coresAllowed="(up to "`echo $(( 100 * $cpu_quota / $cpu_period 
)) | sed 's/..$/.&/'`" cores allowed)"
+fi
+echo "WARN: Drillbit's CPU resource usage will be managed under 
the CGroup : $DRILLBIT_CGROUP "$coresAllowed
+  else
+echo "ERROR: Failed to add Drillbit to CGroup ( $DRILLBIT_CGROUP ) 
for resource usage management.  Ensure that the cgroup manages CPU"
--- End diff --

Inconsistent format: parens here, colon above. Actually, neither are needed.

Also, "for resource usage management" --> "for cpu"


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1200#discussion_r179005732
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -127,6 +127,44 @@ check_before_start()
   fi
 }
 
+check_after_start(){
+#check if the process is running
+if [ -f $pid ]; then
+  dbitProc=$(ps -ef | grep `cat $pid` | grep Drillbit)
+  if [ -n "$dbitProc" ]; then
+# Check and enforce for CGroup
+if [ -n "$DRILLBIT_CGROUP" ]; then 
+  check_and_enforce_cgroup `cat $pid`
+fi
+  fi
+fi
+}
+
+check_and_enforce_cgroup(){
+dbitPid=$1;
+#if [ $(`ps -o cgroup` | grep -c $DRILLBIT_CGROUP ) -eq 1 ]; then 
+if [ -f /cgroup/cpu/${DRILLBIT_CGROUP}/cgroup.procs ]; then 
--- End diff --

Why is the path hardcoded ? Is it a standard ?  I checked on my cluster, 
where /sys/fs/cgroup/. is used instead.



---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-03 Thread kkhatua
GitHub user kkhatua opened a pull request:

https://github.com/apache/drill/pull/1200

DRILL-143: Support CGROUPs resource management

Introduces the `DRILLBIT_CGROUP` option in defined in `drill-env.sh`
The startup script checks if the specified CGroup (ver 2) is available and 
tries to apply it to the launched Drillbit JVM.
This would benefit not just Drill-on-YARN usecases, but  any setup that 
would like CGroups for enforcement of (cpu) resources management.

e.g when Drillbit is configured to use `drillcpu` cgroup
```
[root@maprlabs ~]# 
/opt/mapr/drill/apache-drill-1.14.0-SNAPSHOT/bin/drillbit.sh restart
Stopping drillbit
..
Starting drillbit, logging to /var/log/drill/drillbit.out
WARN: Drillbit's CPU resource usage will be managed under the CGroup : 
drillcpu (up to 4.00 cores allowed)
```

e.g. Non-existent CGroup `droolcpu` is used
```
[root@maprlabs ~]# 
/opt/mapr/drill/apache-drill-1.14.0-SNAPSHOT/bin/drillbit.sh restart
Stopping drillbit
..
Starting drillbit, logging to /var/log/drill/drillbit.out
ERROR: cgroup droolcpu does not found. Ensure that daemon is running and 
cgroup exists
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kkhatua/drill DRILL-143

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1200


commit e9a2551b315b4395c5227dad017f2b4340f41108
Author: Kunal Khatua 
Date:   2018-04-03T05:35:53Z

DRILL-143: Support CGROUPs resource management

Introduces the DRILLBIT_CGROUP option in drill-env.sh.
The startup script checks if the specified CGroup (ver 2) is available and 
tries to apply it to the launched Drillbit JVM.
This would benefit not just Drill-on-YARN usecases, but  any setup that 
would like CGroups for enforcement of (cpu) resources management.

e.g when Drillbit is configured to use `drillcpu` cgroup
```
[root@maprlabs ~]# 
/opt/mapr/drill/apache-drill-1.14.0-SNAPSHOT/bin/drillbit.sh restart
Stopping drillbit
..
Starting drillbit, logging to /var/log/drill/drillbit.out
WARN: Drillbit's CPU resource usage will be managed under the CGroup : 
drillcpu (up to 4.00 cores allowed)
```

e.g. Non-existent CGroup `droolcpu` is used
```
[root@kk127 ~]# 
/opt/mapr/drill/apache-drill-1.14.0-SNAPSHOT/bin/drillbit.sh restart
Stopping drillbit
..
Starting drillbit, logging to /var/log/drill/drillbit.out
ERROR: cgroup droolcpu does not found. Ensure that daemon is running and 
cgroup exists
```




---