Added: 
release/metron/0.4.1/site-book/metron-analytics/metron-profiler/index.html
==============================================================================
--- release/metron/0.4.1/site-book/metron-analytics/metron-profiler/index.html 
(added)
+++ release/metron/0.4.1/site-book/metron-analytics/metron-profiler/index.html 
Fri Sep 15 23:37:46 2017
@@ -0,0 +1,1012 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2017-09-08
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20170908" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Metron &#x2013; Metron Profiler</title>
+    <link rel="stylesheet" href="../../css/apache-maven-fluido-1.3.0.min.css" 
/>
+    <link rel="stylesheet" href="../../css/site.css" />
+    <link rel="stylesheet" href="../../css/print.css" media="print" />
+
+      
+    <script type="text/javascript" 
src="../../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script type="text/javascript">$( document ).ready( function() { $( 
'.carousel' ).carousel( { interval: 3500 } ) } );</script>
+          
+            </head>
+        <body class="topBarDisabled">
+          
+                
+                    
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="http://metron.apache.org/"; 
id="bannerLeft">
+                                                                               
                 <img src="../../images/metron-logo.png"  alt="Apache Metron" 
width="148px" height="48px"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org"; class="externalLink" 
title="Apache">
+        Apache</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="http://metron.apache.org/"; class="externalLink" 
title="Metron">
+        Metron</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="../../index.html" title="Documentation">
+        Documentation</a>
+        </li>
+      <li class="divider ">/</li>
+        <li class="">Metron Profiler</li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right">Last Published: 
2017-09-08</li> <li class="divider pull-right">|</li>
+              <li id="projectVersion" class="pull-right">Version: 0.4.1</li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span3">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+                    <li class="nav-header">User Documentation</li>
+                                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                          
+      <li>
+    
+                          <a href="../../index.html" title="Metron">
+          <i class="icon-chevron-down"></i>
+        Metron</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../Upgrading.html" title="Upgrading">
+          <i class="none"></i>
+        Upgrading</a>
+            </li>
+                                                                               
                                                                                
 
+      <li>
+    
+                          <a href="../../metron-analytics/index.html" 
title="Analytics">
+          <i class="icon-chevron-down"></i>
+        Analytics</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a 
href="../../metron-analytics/metron-maas-service/index.html" 
title="Maas-service">
+          <i class="none"></i>
+        Maas-service</a>
+            </li>
+                      
+      <li class="active">
+    
+            <a href="#"><i class="none"></i>Profiler</a>
+          </li>
+                      
+      <li>
+    
+                          <a 
href="../../metron-analytics/metron-profiler-client/index.html" 
title="Profiler-client">
+          <i class="none"></i>
+        Profiler-client</a>
+            </li>
+                                                                        
+      <li>
+    
+                          <a 
href="../../metron-analytics/metron-statistics/index.html" title="Statistics">
+          <i class="icon-chevron-right"></i>
+        Statistics</a>
+                  </li>
+              </ul>
+        </li>
+                      
+      <li>
+    
+                          <a 
href="../../metron-contrib/metron-docker/index.html" title="Docker">
+          <i class="none"></i>
+        Docker</a>
+            </li>
+                                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                 
+      <li>
+    
+                          <a href="../../metron-deployment/index.html" 
title="Deployment">
+          <i class="icon-chevron-right"></i>
+        Deployment</a>
+                  </li>
+                      
+      <li>
+    
+                          <a 
href="../../metron-interface/metron-alerts/index.html" title="Alerts">
+          <i class="none"></i>
+        Alerts</a>
+            </li>
+                      
+      <li>
+    
+                          <a 
href="../../metron-interface/metron-config/index.html" title="Config">
+          <i class="none"></i>
+        Config</a>
+            </li>
+                      
+      <li>
+    
+                          <a 
href="../../metron-interface/metron-rest/index.html" title="Rest">
+          <i class="none"></i>
+        Rest</a>
+            </li>
+                                                                               
                                                                                
                                                                                
                   
+      <li>
+    
+                          <a href="../../metron-platform/index.html" 
title="Platform">
+          <i class="icon-chevron-right"></i>
+        Platform</a>
+                  </li>
+                                                                               
                             
+      <li>
+    
+                          <a href="../../metron-sensors/index.html" 
title="Sensors">
+          <i class="icon-chevron-right"></i>
+        Sensors</a>
+                  </li>
+                                                                        
+      <li>
+    
+                          <a 
href="../../metron-stellar/stellar-common/index.html" title="Stellar-common">
+          <i class="icon-chevron-right"></i>
+        Stellar-common</a>
+                  </li>
+                                                                        
+      <li>
+    
+                          <a href="../../use-cases/index.html" 
title="Use-cases">
+          <i class="icon-chevron-right"></i>
+        Use-cases</a>
+                  </li>
+              </ul>
+        </li>
+            </ul>
+                
+                    
+                
+          <hr class="divider" />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/"; title="Built 
by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" 
src="../../images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span9" >
+                                  
+            <h1>Metron Profiler</h1>
+<p><a name="Metron_Profiler"></a></p>
+<p>The Profiler is a feature extraction mechanism that can generate a profile 
describing the behavior of an entity. An entity might be a server, user, subnet 
or application. Once a profile has been generated defining what normal behavior 
looks-like, models can be built that identify anomalous behavior.</p>
+<p>This is achieved by summarizing the streaming telemetry data consumed by 
Metron over sliding windows. A summary statistic is applied to the data 
received within a given window. Collecting this summary across many windows 
results in a time series that is useful for analysis.</p>
+<p>Any field contained within a message can be used to generate a profile. A 
profile can even be produced by combining fields that originate in different 
data sources. A user has considerable power to transform the data used in a 
profile by leveraging the Stellar language. A user only need configure the 
desired profiles and ensure that the Profiler topology is running.</p>
+
+<ul>
+  
+<li><a href="#Installation">Installation</a></li>
+  
+<li><a href="#Getting_Started">Getting Started</a></li>
+  
+<li><a href="#Creating_Profiles">Creating Profiles</a></li>
+  
+<li><a href="#Configuring_the_Profiler">Configuring the Profiler</a></li>
+  
+<li><a href="#Examples">Examples</a></li>
+  
+<li><a href="#Implementation">Implementation</a></li>
+</ul>
+<div class="section">
+<h2><a name="Installation"></a>Installation</h2>
+<p>Follow these instructions to install the Profiler. This assumes that core 
Metron has already been installed and validated. </p>
+
+<ol style="list-style-type: decimal">
+  
+<li>
+<p>Build the Metron RPMs (see Building the <a 
href="../../metron-deployment/index.html#RPMs">RPMs</a>).</p>
+<p>You may have already built the Metron RPMs when core Metron was 
installed.</p>
+  
+<div class="source">
+<div class="source">
+<pre>$ find metron-deployment/ -name &quot;metron-profiler*.rpm&quot;
+metron-deployment//packaging/docker/rpm-docker/RPMS/noarch/metron-profiler-0.4.1-201707131420.noarch.rpm
+</pre></div></div></li>
+  
+<li>
+<p>Copy the Profiler RPM to the installation host. </p>
+<p>The installation host must be the same host on which core Metron was 
installed. Depending on how you installed Metron, the Profiler RPM might have 
already been copied to this host with the other Metron RPMs.</p>
+  
+<div class="source">
+<div class="source">
+<pre>[root@node1 ~]# find /localrepo/  -name &quot;metron-profiler*.rpm&quot;
+/localrepo/metron-profiler-0.4.0-201707112313.noarch.rpm
+</pre></div></div></li>
+  
+<li>
+<p>Install the RPM.</p>
+  
+<div class="source">
+<div class="source">
+<pre>[root@node1 ~]# rpm -ivh metron-profiler-*.noarch.rpm
+Preparing...                ########################################### [100%]
+   1:metron-profiler        ########################################### [100%]
+</pre></div></div>
+  
+<div class="source">
+<div class="source">
+<pre>[root@node1 ~]# rpm -ql metron-profiler
+/usr/metron
+/usr/metron/0.4.1
+/usr/metron/0.4.1/bin
+/usr/metron/0.4.1/bin/start_profiler_topology.sh
+/usr/metron/0.4.1/config
+/usr/metron/0.4.1/config/profiler.properties
+/usr/metron/0.4.1/flux
+/usr/metron/0.4.1/flux/profiler
+/usr/metron/0.4.1/flux/profiler/remote.yaml
+/usr/metron/0.4.1/lib
+/usr/metron/0.4.1/lib/metron-profiler-0.4.0-uber.jar
+</pre></div></div></li>
+  
+<li>
+<p>Create a table within HBase that will store the profile data. By default, 
the table is named <tt>profiler</tt> with a column family <tt>P</tt>. The table 
name and column family must match the Profiler&#x2019;s configuration (see <a 
href="#Configuring_the_Profiler">Configuring the Profiler</a>). </p>
+  
+<div class="source">
+<div class="source">
+<pre>$ /usr/hdp/current/hbase-client/bin/hbase shell
+hbase(main):001:0&gt; create 'profiler', 'P'
+</pre></div></div></li>
+  
+<li>
+<p>Edit the configuration file located at 
<tt>$METRON_HOME/config/profiler.properties</tt>. </p>
+  
+<div class="source">
+<div class="source">
+<pre>kafka.zk=node1:2181
+kafka.broker=node1:6667
+</pre></div></div>
+<p>Change <tt>kafka.zk</tt> to refer to Zookeeper in your environment.<br 
/>Change <tt>kafka.broker</tt> to refer to a Kafka Broker in your 
environment.</p></li>
+  
+<li>
+<p>Start the Profiler topology.</p>
+  
+<div class="source">
+<div class="source">
+<pre>$ cd $METRON_HOME
+$ bin/start_profiler_topology.sh
+</pre></div></div></li>
+</ol>
+<p>At this point the Profiler is running and consuming telemetry messages. We 
have not defined any profiles yet, so it is not doing anything very useful. The 
next section walks you through the steps to create your very first 
&#x201c;Hello, World!&#x201d; profile.</p></div>
+<div class="section">
+<h2><a name="Getting_Started"></a>Getting Started</h2>
+<p>This section will describe the steps required to get your first 
&#x201c;Hello, World!&#x201d;&quot; profile running. This assumes that you have 
a successful Profiler <a href="#Installation">Installation</a> and have it 
running.</p>
+
+<ol style="list-style-type: decimal">
+  
+<li>
+<p>Create the profile definition in a file located at 
<tt>$METRON_HOME/config/zookeeper/profiler.json</tt>. This file will likely not 
exist, if you have never created Profiles before.</p>
+<p>The following example will create a profile that simply counts the number 
of messages per <tt>ip_src_addr</tt>.</p>
+  
+<div class="source">
+<div class="source">
+<pre>{
+  &quot;profiles&quot;: [
+    {
+      &quot;profile&quot;: &quot;hello-world&quot;,
+      &quot;onlyif&quot;:  &quot;exists(ip_src_addr)&quot;,
+      &quot;foreach&quot;: &quot;ip_src_addr&quot;,
+      &quot;init&quot;:    { &quot;count&quot;: &quot;0&quot; },
+      &quot;update&quot;:  { &quot;count&quot;: &quot;count + 1&quot; },
+      &quot;result&quot;:  &quot;count&quot;
+    }
+  ]
+}
+</pre></div></div></li>
+  
+<li>
+<p>Upload the profile definition to Zookeeper. Change <tt>node1:2181</tt> to 
refer the actual Zookeeper host in your environment.</p>
+  
+<div class="source">
+<div class="source">
+<pre>$ cd $METRON_HOME
+$ bin/zk_load_configs.sh -m PUSH -i config/zookeeper/ -z node1:2181
+</pre></div></div>
+<p>You can validate this by reading back the Metron configuration from 
Zookeeper using the same script. The result should look-like the following.</p>
+  
+<div class="source">
+<div class="source">
+<pre>$ bin/zk_load_configs.sh -m DUMP -z node1:2181
+...
+PROFILER Config: profiler
+{
+  &quot;profiles&quot;: [
+    {
+      &quot;profile&quot;: &quot;hello-world&quot;,
+      &quot;onlyif&quot;:  &quot;exists(ip_src_addr)&quot;,
+      &quot;foreach&quot;: &quot;ip_src_addr&quot;,
+      &quot;init&quot;:    { &quot;count&quot;: &quot;0&quot; },
+      &quot;update&quot;:  { &quot;count&quot;: &quot;count + 1&quot; },
+      &quot;result&quot;:  &quot;count&quot;
+    }
+  ]
+}
+</pre></div></div></li>
+  
+<li>
+<p>Ensure that test messages are being sent to the Profiler&#x2019;s input 
topic in Kafka. The Profiler will consume messages from the input topic defined 
in the Profiler&#x2019;s configuration (see <a 
href="#Configuring_the_Profiler">Configuring the Profiler</a>). By default this 
is the <tt>indexing</tt> topic.</p></li>
+  
+<li>
+<p>Check the HBase table to validate that the Profiler is writing the profile. 
Remember that the Profiler is flushing the profile every 15 minutes. You will 
need to wait at least this long to start seeing profile data in HBase.</p>
+  
+<div class="source">
+<div class="source">
+<pre>$ /usr/hdp/current/hbase-client/bin/hbase shell
+hbase(main):001:0&gt; count 'profiler'
+</pre></div></div></li>
+  
+<li>
+<p>Use the <a href="../metron-profiler-client/index.html">Profiler Client</a> 
to read the profile data. The following <tt>PROFILE_GET</tt> command will read 
the data written by the <tt>hello-world</tt> profile. This assumes that 
<tt>10.0.0.1</tt> is one of the values for <tt>ip_src_addr</tt> contained 
within the telemetry consumed by the Profiler.</p>
+  
+<div class="source">
+<div class="source">
+<pre>$ bin/stellar -z node1:2181
+[Stellar]&gt;&gt;&gt; PROFILE_GET( &quot;hello-world&quot;, 
&quot;10.0.0.1&quot;, PROFILE_FIXED(30, &quot;MINUTES&quot;))
+[451, 448]
+</pre></div></div>
+<p>This result indicates that over the past 30 minutes, the Profiler stored 
two values related to the source IP address &#x201c;10.0.0.1&#x201d;. In the 
first 15 minute period, the IP <tt>10.0.0.1</tt> was seen in 451 telemetry 
messages. In the second 15 minute period, the same IP was seen in 448 telemetry 
messages.</p>
+<p>It is assumed that the <tt>PROFILE_GET</tt> client is correctly configured 
to match the Profile configuration before using it to read that Profile. More 
information on configuring and using the Profiler client can be found <a 
href="../metron-profiler-client/index.html">here</a>. </p></li>
+</ol></div>
+<div class="section">
+<h2><a name="Creating_Profiles"></a>Creating Profiles</h2>
+<p>The Profiler specification requires a JSON-formatted set of elements, many 
of which can contain Stellar code. The specification contains the following 
elements. (For the impatient, skip ahead to the <a 
href="#Examples">Examples</a>.) The specification for the Profiler topology is 
stored in Zookeeper at <tt>/metron/topology/profiler</tt>. These properties 
also exist in the local filesystem at 
<tt>$METRON_HOME/config/zookeeper/profiler.json</tt>. The values can be changed 
on disk and then uploaded to Zookeeper using 
<tt>$METRON_HOME/bin/zk_load_configs.sh</tt>.</p>
+
+<table border="0" class="table table-striped">
+  <thead>
+    
+<tr class="a">
+      
+<th>Name </th>
+      
+<th> </th>
+      
+<th>Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    
+<tr class="b">
+      
+<td><a href="#profile">profile</a> </td>
+      
+<td>Required </td>
+      
+<td>Unique name identifying the profile.</td>
+    </tr>
+    
+<tr class="a">
+      
+<td><a href="#foreach">foreach</a> </td>
+      
+<td>Required </td>
+      
+<td>A separate profile is maintained &#x201c;for each&#x201d; of these.</td>
+    </tr>
+    
+<tr class="b">
+      
+<td><a href="#onlyif">onlyif</a> </td>
+      
+<td>Optional </td>
+      
+<td>Boolean expression that determines if a message should be applied to the 
profile.</td>
+    </tr>
+    
+<tr class="a">
+      
+<td><a href="#groupBy">groupBy</a> </td>
+      
+<td>Optional </td>
+      
+<td>One or more Stellar expressions used to group the profile measurements 
when persisted.</td>
+    </tr>
+    
+<tr class="b">
+      
+<td><a href="#init">init</a> </td>
+      
+<td>Optional </td>
+      
+<td>One or more expressions executed at the start of a window period.</td>
+    </tr>
+    
+<tr class="a">
+      
+<td><a href="#update">update</a> </td>
+      
+<td>Required </td>
+      
+<td>One or more expressions executed when a message is applied to the 
profile.</td>
+    </tr>
+    
+<tr class="b">
+      
+<td><a href="#result">result</a> </td>
+      
+<td>Required </td>
+      
+<td>Stellar expressions that are executed when the window period expires.</td>
+    </tr>
+    
+<tr class="a">
+      
+<td><a href="#expires">expires</a> </td>
+      
+<td>Optional </td>
+      
+<td>Profile data is purged after this period of time, specified in days.</td>
+    </tr>
+  </tbody>
+</table>
+<div class="section">
+<h3><a name="profile"></a><tt>profile</tt></h3>
+<p><i>Required</i></p>
+<p>A unique name identifying the profile. The field is treated as a 
string.</p></div>
+<div class="section">
+<h3><a name="foreach"></a><tt>foreach</tt></h3>
+<p><i>Required</i></p>
+<p>A separate profile is maintained &#x2018;for each&#x2019; of these. This is 
effectively the entity that the profile is describing. The field is expected to 
contain a Stellar expression whose result is the entity name. </p>
+<p>For example, if <tt>ip_src_addr</tt> then a separate profile would be 
maintained for each unique IP source address in the data; 10.0.0.1, 10.0.0.2, 
etc.</p></div>
+<div class="section">
+<h3><a name="onlyif"></a><tt>onlyif</tt></h3>
+<p><i>Optional</i></p>
+<p>An expression that determines if a message should be applied to the 
profile. A Stellar expression that returns a Boolean is expected. A message is 
only applied to a profile if this expression is true. This allows a profile to 
filter the messages that get applied to it.</p></div>
+<div class="section">
+<h3><a name="groupBy"></a><tt>groupBy</tt></h3>
+<p><i>Optional</i></p>
+<p>One or more Stellar expressions used to group the profile measurements when 
persisted. This can be used to sort the Profile data to allow for a contiguous 
scan when accessing subsets of the data. This is also one way to deal with 
calendar effects. For example, where activity on a weekday can be very 
different from a weekend.</p>
+<p>A common use case would be grouping by day of week. This allows a 
contiguous scan to access all profile data for Mondays only. Using the 
following definition would achieve this.</p>
+
+<div class="source">
+<div class="source">
+<pre>&quot;groupBy&quot;: [ &quot;DAY_OF_WEEK(start)&quot; ]
+</pre></div></div>
+<p>The expression can reference any of these variables.</p>
+
+<ul>
+  
+<li>Any variable defined by the profile in its <tt>init</tt> or 
<tt>update</tt> expressions.</li>
+  
+<li><tt>profile</tt> The name of the profile.</li>
+  
+<li><tt>entity</tt> The name of the entity being profiled.</li>
+  
+<li><tt>start</tt> The start time of the profile period in epoch 
milliseconds.</li>
+  
+<li><tt>end</tt> The end time of the profile period in epoch milliseconds.</li>
+  
+<li><tt>duration</tt> The duration of the profile period in milliseconds.</li>
+  
+<li><tt>result</tt> The result of executing the <tt>result</tt> 
expression.</li>
+</ul></div>
+<div class="section">
+<h3><a name="init"></a><tt>init</tt></h3>
+<p><i>Optional</i></p>
+<p>One or more expressions executed at the start of a window period. A map is 
expected where the key is the variable name and the value is a Stellar 
expression. The map can contain zero or more variable:expression pairs. At the 
start of each window period, each expression is executed once and stored in the 
given variable. Note that constant init values such as &#x201c;0&#x201d; must 
be in quotes regardless of their type, as the init value must be a string to be 
executed by Stellar.</p>
+
+<div class="source">
+<div class="source">
+<pre>&quot;init&quot;: {
+  &quot;var1&quot;: &quot;0&quot;,
+  &quot;var2&quot;: &quot;1&quot;
+}
+</pre></div></div></div>
+<div class="section">
+<h3><a name="update"></a><tt>update</tt></h3>
+<p><i>Required</i></p>
+<p>One or more expressions executed when a message is applied to the profile. 
A map is expected where the key is the variable name and the value is a Stellar 
expression. The map can include 0 or more variables/expressions. When each 
message is applied to the profile, the expression is executed and stored in a 
variable with the given name.</p>
+
+<div class="source">
+<div class="source">
+<pre>&quot;update&quot;: {
+  &quot;var1&quot;: &quot;var1 + 1&quot;,
+  &quot;var2&quot;: &quot;var2 + 1&quot;
+}
+</pre></div></div></div>
+<div class="section">
+<h3><a name="result"></a><tt>result</tt></h3>
+<p><i>Required</i></p>
+<p>Stellar expressions that are executed when the window period expires. The 
expressions are expected to summarize the messages that were applied to the 
profile over the window period. In the most basic form a single result is 
persisted for later retrieval.</p>
+
+<div class="source">
+<div class="source">
+<pre>&quot;result&quot;: &quot;var1 + var2&quot;
+</pre></div></div>
+<p>For more advanced use cases, a profile can generate two types of results. A 
profile can define one or both of these result types at the same time.</p>
+
+<ul>
+  
+<li><tt>profile</tt>: A required expression that defines a value that is 
persisted for later retrieval.</li>
+  
+<li><tt>triage</tt>: An optional expression that defines values that are 
accessible within the Threat Triage process.</li>
+</ul>
+<p><b>profile</b></p>
+<p>A required Stellar expression that results in a value that is persisted in 
the profile store for later retrieval. The expression can result in any object 
that is Kryo serializable. These values can be retrieved for later use with the 
<a href="../metron-profiler-client/index.html">Profiler Client</a>.</p>
+
+<div class="source">
+<div class="source">
+<pre>&quot;result&quot;: {
+    &quot;profile&quot;: &quot;2 + 2&quot;
+}
+</pre></div></div>
+<p>An alternative, simplified form is also acceptable.</p>
+
+<div class="source">
+<div class="source">
+<pre>&quot;result&quot;: &quot;2 + 2&quot;
+</pre></div></div>
+<p><b>triage</b></p>
+<p>An optional map of one or more Stellar expressions. The value of each 
expression is made available to the Threat Triage process under the given name. 
Each expression must result in a either a primitive type, like an integer, 
long, or short, or a String. All other types will result in an error.</p>
+<p>In the following example, three values, the minimum, the maximum and the 
mean are appended to a message. This message is consumed by Metron, like other 
sources of telemetry, and each of these values are accessible from within the 
Threat Triage process using the given field names; <tt>min</tt>, <tt>max</tt>, 
and <tt>mean</tt>.</p>
+
+<div class="source">
+<div class="source">
+<pre>&quot;result&quot;: {
+    &quot;triage&quot;: {
+        &quot;min&quot;: &quot;STATS_MIN(stats)&quot;,
+        &quot;max&quot;: &quot;STATS_MAX(stats)&quot;,
+        &quot;mean&quot;: &quot;STATS_MEAN(stats)&quot;
+    }
+}
+</pre></div></div></div>
+<div class="section">
+<h3><a name="expires"></a><tt>expires</tt></h3>
+<p><i>Optional</i></p>
+<p>A numeric value that defines how many days the profile data is retained. 
After this time, the data expires and is no longer accessible. If no value is 
defined, the data does not expire.</p>
+<p>The REPL can be a powerful for developing profiles. Read all about <a 
href="../metron-profiler-client/index.html#developing_profiles">Developing 
Profiles</a>.</p></div></div>
+<div class="section">
+<h2><a name="Configuring_the_Profiler"></a>Configuring the Profiler</h2>
+<p>The Profiler runs as an independent Storm topology. The configuration for 
the Profiler topology is stored in local filesystem at 
<tt>$METRON_HOME/config/profiler.properties</tt>. The values can be changed on 
disk and then the Profiler topology must be restarted.</p>
+
+<table border="0" class="table table-striped">
+  <thead>
+    
+<tr class="a">
+      
+<th>Setting </th>
+      
+<th>Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    
+<tr class="b">
+      
+<td><a href="#profiler.input.topic"><tt>profiler.input.topic</tt></a> </td>
+      
+<td>The name of the Kafka topic from which to consume data.</td>
+    </tr>
+    
+<tr class="a">
+      
+<td><a href="#profiler.output.topic"><tt>profiler.output.topic</tt></a> </td>
+      
+<td>The name of the Kafka topic to which profile data is written. Only used 
with profiles that define the <a href="#result"><tt>triage</tt> result 
field</a>.</td>
+    </tr>
+    
+<tr class="b">
+      
+<td><a href="#profiler.period.duration"><tt>profiler.period.duration</tt></a> 
</td>
+      
+<td>The duration of each profile period.</td>
+    </tr>
+    
+<tr class="a">
+      
+<td><a 
href="#profiler.period.duration.units"><tt>profiler.period.duration.units</tt></a>
 </td>
+      
+<td>The units used to specify the <a 
href="#profiler.period.duration"><tt>profiler.period.duration</tt></a>.</td>
+    </tr>
+    
+<tr class="b">
+      
+<td><a href="#profiler.workers"><tt>profiler.workers</tt></a> </td>
+      
+<td>The number of worker processes for the topology.</td>
+    </tr>
+    
+<tr class="a">
+      
+<td><a href="#profiler.executors"><tt>profiler.executors</tt></a> </td>
+      
+<td>The number of executors to spawn per component.</td>
+    </tr>
+    
+<tr class="b">
+      
+<td><a href="#profiler.ttl"><tt>profiler.ttl</tt></a> </td>
+      
+<td>If a message has not been applied to a Profile in this period of time, the 
Profile will be forgotten and its resources will be cleaned up.</td>
+    </tr>
+    
+<tr class="a">
+      
+<td><a href="#profiler.ttl.units"><tt>profiler.ttl.units</tt></a> </td>
+      
+<td>The units used to specify the <tt>profiler.ttl</tt>.</td>
+    </tr>
+    
+<tr class="b">
+      
+<td><a 
href="#profiler.hbase.salt.divisor"><tt>profiler.hbase.salt.divisor</tt></a> 
</td>
+      
+<td>A salt is prepended to the row key to help prevent hotspotting.</td>
+    </tr>
+    
+<tr class="a">
+      
+<td><a href="#profiler.hbase.table"><tt>profiler.hbase.table</tt></a> </td>
+      
+<td>The name of the HBase table that profiles are written to.</td>
+    </tr>
+    
+<tr class="b">
+      
+<td><a 
href="#profiler.hbase.column.family"><tt>profiler.hbase.column.family</tt></a> 
</td>
+      
+<td>The column family used to store profiles.</td>
+    </tr>
+    
+<tr class="a">
+      
+<td><a href="#profiler.hbase.batch"><tt>profiler.hbase.batch</tt></a> </td>
+      
+<td>The number of puts that are written to HBase in a single batch.</td>
+    </tr>
+    
+<tr class="b">
+      
+<td><a 
href="#profiler.hbase.flush.interval.seconds"><tt>profiler.hbase.flush.interval.seconds</tt></a>
 </td>
+      
+<td>The maximum number of seconds between batch writes to HBase.</td>
+    </tr>
+  </tbody>
+</table>
+<div class="section">
+<h3><a name="profiler.input.topic"></a><tt>profiler.input.topic</tt></h3>
+<p><i>Default</i>: indexing</p>
+<p>The name of the Kafka topic from which to consume data. By default, the 
Profiler consumes data from the <tt>indexing</tt> topic so that it has access 
to fully enriched telemetry.</p></div>
+<div class="section">
+<h3><a name="profiler.output.topic"></a><tt>profiler.output.topic</tt></h3>
+<p><i>Default</i>: enrichments</p>
+<p>The name of the Kafka topic to which profile data is written. This property 
is only applicable to profiles that define the <a 
href="#result"><tt>result</tt> <tt>triage</tt> field</a>. This allows Profile 
data to be selectively triaged like any other source of telemetry in 
Metron.</p></div>
+<div class="section">
+<h3><a 
name="profiler.period.duration"></a><tt>profiler.period.duration</tt></h3>
+<p><i>Default</i>: 15</p>
+<p>The duration of each profile period. This value should be defined along 
with <a 
href="#profiler.period.duration.units"><tt>profiler.period.duration.units</tt></a>.</p>
+<p><i>Important</i>: To read a profile using the <a 
href="metron-analytics/metron-profiler-client/index.html">Profiler Client</a>, 
the Profiler Client&#x2019;s <tt>profiler.client.period.duration</tt> property 
must match this value. Otherwise, the Profiler Client will be unable to read 
the profile data. </p></div>
+<div class="section">
+<h3><a 
name="profiler.period.duration.units"></a><tt>profiler.period.duration.units</tt></h3>
+<p><i>Default</i>: MINUTES</p>
+<p>The units used to specify the <tt>profiler.period.duration</tt>. This value 
should be defined along with <a 
href="#profiler.period.duration"><tt>profiler.period.duration</tt></a>.</p>
+<p><i>Important</i>: To read a profile using the Profiler Client, the Profiler 
Client&#x2019;s <tt>profiler.client.period.duration.units</tt> property must 
match this value. Otherwise, the <a 
href="metron-analytics/metron-profiler-client/index.html">Profiler Client</a> 
will be unable to read the profile data.</p></div>
+<div class="section">
+<h3><a name="profiler.workers"></a><tt>profiler.workers</tt></h3>
+<p><i>Default</i>: 1</p>
+<p>The number of worker processes to create for the Profiler topology. This 
property is useful for performance tuning the Profiler.</p></div>
+<div class="section">
+<h3><a name="profiler.executors"></a><tt>profiler.executors</tt></h3>
+<p><i>Default</i>: 0</p>
+<p>The number of executors to spawn per component for the Profiler topology. 
This property is useful for performance tuning the Profiler.</p></div>
+<div class="section">
+<h3><a name="profiler.ttl"></a><tt>profiler.ttl</tt></h3>
+<p><i>Default</i>: 30</p>
+<p>If a message has not been applied to a Profile in this period of time, the 
Profile will be terminated and its resources will be cleaned up. This value 
should be defined along with <a 
href="#profiler.ttl.units"><tt>profiler.ttl.units</tt></a>.</p>
+<p>This time-to-live does not affect the persisted Profile data in HBase. It 
only affects the state stored in memory during the execution of the latest 
profile period. This state will be deleted if the time-to-live is 
exceeded.</p></div>
+<div class="section">
+<h3><a name="profiler.ttl.units"></a><tt>profiler.ttl.units</tt></h3>
+<p><i>Default</i>: MINUTES</p>
+<p>The units used to specify the <a 
href="#profiler.ttl"><tt>profiler.ttl</tt></a>.</p></div>
+<div class="section">
+<h3><a 
name="profiler.hbase.salt.divisor"></a><tt>profiler.hbase.salt.divisor</tt></h3>
+<p><i>Default</i>: 1000</p>
+<p>A salt is prepended to the row key to help prevent hotspotting. This 
constant is used to generate the salt. This constant should be roughly equal to 
the number of nodes in the Hbase cluster to ensure even distribution of 
data.</p></div>
+<div class="section">
+<h3><a name="profiler.hbase.table"></a><tt>profiler.hbase.table</tt></h3>
+<p><i>Default</i>: profiler</p>
+<p>The name of the HBase table that profile data is written to. The Profiler 
expects that the table exists and is writable. It will not create the 
table.</p></div>
+<div class="section">
+<h3><a 
name="profiler.hbase.column.family"></a><tt>profiler.hbase.column.family</tt></h3>
+<p><i>Default</i>: P</p>
+<p>The column family used to store profile data in HBase.</p></div>
+<div class="section">
+<h3><a name="profiler.hbase.batch"></a><tt>profiler.hbase.batch</tt></h3>
+<p><i>Default</i>: 10</p>
+<p>The number of puts that are written to HBase in a single batch.</p></div>
+<div class="section">
+<h3><a 
name="profiler.hbase.flush.interval.seconds"></a><tt>profiler.hbase.flush.interval.seconds</tt></h3>
+<p><i>Default</i>: 30</p>
+<p>The maximum number of seconds between batch writes to HBase.</p></div></div>
+<div class="section">
+<h2><a name="Examples"></a>Examples</h2>
+<p>The following examples are intended to highlight the functionality provided 
by the Profiler. Each shows the configuration that would be required to 
generate the profile. </p>
+<p>These examples assume a fictitious input message stream that looks 
something like the following.</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+  &quot;ip_src_addr&quot;: &quot;10.0.0.1&quot;,
+  &quot;protocol&quot;: &quot;HTTPS&quot;,
+  &quot;length&quot;: &quot;10&quot;,
+  &quot;bytes_in&quot;: &quot;234&quot;
+},
+{
+  &quot;ip_src_addr&quot;: &quot;10.0.0.2&quot;,
+  &quot;protocol&quot;: &quot;HTTP&quot;,
+  &quot;length&quot;: &quot;20&quot;,
+  &quot;bytes_in&quot;: &quot;390&quot;
+},
+{
+  &quot;ip_src_addr&quot;: &quot;10.0.0.3&quot;,
+  &quot;protocol&quot;: &quot;DNS&quot;,
+  &quot;length&quot;: &quot;30&quot;,
+  &quot;bytes_in&quot;: &quot;560&quot;
+}
+</pre></div></div>
+<div class="section">
+<h3><a name="Example_1"></a>Example 1</h3>
+<p>The total number of bytes of HTTP data for each host. The following 
configuration would be used to generate this profile.</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+  &quot;profiles&quot;: [
+    {
+      &quot;profile&quot;: &quot;example1&quot;,
+      &quot;foreach&quot;: &quot;ip_src_addr&quot;,
+      &quot;onlyif&quot;: &quot;protocol == 'HTTP'&quot;,
+      &quot;init&quot;: {
+        &quot;total_bytes&quot;: 0.0
+      },
+      &quot;update&quot;: {
+        &quot;total_bytes&quot;: &quot;total_bytes + bytes_in&quot;
+      },
+      &quot;result&quot;: &quot;total_bytes&quot;,
+      &quot;expires&quot;: 30
+    }
+  ]
+}
+</pre></div></div>
+<p>This creates a profile&#x2026;</p>
+
+<ul>
+  
+<li>Named &#x2018;example1&#x2019;</li>
+  
+<li>That for each IP source address</li>
+  
+<li>Only if the &#x2018;protocol&#x2019; field equals &#x2018;HTTP&#x2019;</li>
+  
+<li>Initializes a counter &#x2018;total_bytes&#x2019; to zero</li>
+  
+<li>Adds to &#x2018;total_bytes&#x2019; the value of the message&#x2019;s 
&#x2018;bytes_in&#x2019; field</li>
+  
+<li>Returns &#x2018;total_bytes&#x2019; as the result</li>
+  
+<li>The profile data will expire in 30 days</li>
+</ul></div>
+<div class="section">
+<h3><a name="Example_2"></a>Example 2</h3>
+<p>The ratio of DNS traffic to HTTP traffic for each host. The following 
configuration would be used to generate this profile.</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+  &quot;profiles&quot;: [
+    {
+      &quot;profile&quot;: &quot;example2&quot;,
+      &quot;foreach&quot;: &quot;ip_src_addr&quot;,
+      &quot;onlyif&quot;: &quot;protocol == 'DNS' or protocol == 'HTTP'&quot;,
+      &quot;init&quot;: {
+        &quot;num_dns&quot;: 1.0,
+        &quot;num_http&quot;: 1.0
+      },
+      &quot;update&quot;: {
+        &quot;num_dns&quot;: &quot;num_dns + (if protocol == 'DNS' then 1 else 
0)&quot;,
+        &quot;num_http&quot;: &quot;num_http + (if protocol == 'HTTP' then 1 
else 0)&quot;
+      },
+      &quot;result&quot;: &quot;num_dns / num_http&quot;
+    }
+  ]
+}
+</pre></div></div>
+<p>This creates a profile&#x2026;</p>
+
+<ul>
+  
+<li>Named &#x2018;example2&#x2019;</li>
+  
+<li>That for each IP source address</li>
+  
+<li>Only if the &#x2018;protocol&#x2019; field equals &#x2018;HTTP&#x2019; or 
&#x2018;DNS&#x2019;</li>
+  
+<li>Accumulates the number of DNS requests</li>
+  
+<li>Accumulates the number of HTTP requests</li>
+  
+<li>Returns the ratio of these as the result</li>
+</ul></div>
+<div class="section">
+<h3><a name="Example_3"></a>Example 3</h3>
+<p>The average of the <tt>length</tt> field of HTTP traffic. The following 
configuration would be used to generate this profile.</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+  &quot;profiles&quot;: [
+    {
+      &quot;profile&quot;: &quot;example3&quot;,
+      &quot;foreach&quot;: &quot;ip_src_addr&quot;,
+      &quot;onlyif&quot;: &quot;protocol == 'HTTP'&quot;,
+      &quot;update&quot;: { &quot;s&quot;: &quot;STATS_ADD(s, length)&quot; },
+      &quot;result&quot;: &quot;STATS_MEAN(s)&quot;
+    }
+  ]
+}
+</pre></div></div>
+<p>This creates a profile&#x2026;</p>
+
+<ul>
+  
+<li>Named &#x2018;example3&#x2019;</li>
+  
+<li>That for each IP source address</li>
+  
+<li>Only if the &#x2018;protocol&#x2019; field is &#x2018;HTTP&#x2019;</li>
+  
+<li>Adds the <tt>length</tt> field from each message</li>
+  
+<li>Calculates the average as the result</li>
+</ul></div>
+<div class="section">
+<h3><a name="Example_4"></a>Example 4</h3>
+<p>It is important to note that the Profiler can persist any serializable 
Object, not just numeric values. An alternative to the previous example could 
take advantage of this. </p>
+<p>Instead of storing the mean of the lengths, the profile could store a 
statistical summarization of the lengths. This summary can then be used at a 
later time to calculate the mean, min, max, percentiles, or any other sensible 
metric. This provides a much greater degree of flexibility.</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+  &quot;profiles&quot;: [
+    {
+      &quot;profile&quot;: &quot;example4&quot;,
+      &quot;foreach&quot;: &quot;ip_src_addr&quot;,
+      &quot;onlyif&quot;: &quot;protocol == 'HTTP'&quot;,
+      &quot;update&quot;: { &quot;s&quot;: &quot;STATS_ADD(s, length)&quot; },
+      &quot;result&quot;: &quot;s&quot;
+    }
+  ]
+}
+</pre></div></div>
+<p>The following Stellar REPL session shows how you might use this summary to 
calculate different metrics with the same underlying profile data. It is 
assumed that the PROFILE_GET client is configured as described <a 
href="../metron-profiler-client/index.html">here</a>.</p>
+<p>Retrieve the last 30 minutes of profile measurements for a specific 
host.</p>
+
+<div class="source">
+<div class="source">
+<pre>$ bin/stellar -z node1:2181
+
+[Stellar]&gt;&gt;&gt; stats := PROFILE_GET( &quot;example4&quot;, 
&quot;10.0.0.1&quot;, PROFILE_FIXED(30, &quot;MINUTES&quot;))
+[Stellar]&gt;&gt;&gt; stats
+[org.apache.metron.common.math.stats.OnlineStatisticsProvider@79fe4ab9, ...]
+</pre></div></div>
+<p>Calculate different metrics with the same profile data.</p>
+
+<div class="source">
+<div class="source">
+<pre>[Stellar]&gt;&gt;&gt; STATS_MEAN( GET_FIRST( stats))
+15979.0625
+
+[Stellar]&gt;&gt;&gt; STATS_PERCENTILE( GET_FIRST(stats), 90)
+30310.958
+</pre></div></div>
+<p>Merge all of the profile measurements over the past 30 minutes into a 
single summary and calculate the 90th percentile.</p>
+
+<div class="source">
+<div class="source">
+<pre>[Stellar]&gt;&gt;&gt; merged := STATS_MERGE( stats)
+[Stellar]&gt;&gt;&gt; STATS_PERCENTILE(merged, 90)
+29810.992
+</pre></div></div>
+<p>More information on accessing profile data can be found in the <a 
href="../metron-profiler-client/index.html">Profiler Client</a>.</p>
+<p>More information on using the <a 
href="../../metron-platform/metron-common/index.html"><tt>STATS_*</tt> 
functions in Stellar can be found here</a>.</p></div></div>
+<div class="section">
+<h2><a name="Implementation"></a>Implementation</h2></div>
+<div class="section">
+<h2><a name="Key_Classes"></a>Key Classes</h2>
+
+<ul>
+  
+<li>
+<p><tt>ProfileMeasurement</tt> - Represents a single data point within a 
Profile. A Profile is effectively a time series. To this end a Profile is 
composed of many ProfileMeasurement values which in aggregate form a time 
series.</p></li>
+  
+<li>
+<p><tt>ProfilePeriod</tt> - The Profiler captures one 
<tt>ProfileMeasurement</tt> each <tt>ProfilePeriod</tt>. A 
<tt>ProfilePeriod</tt> will occur at fixed, deterministic points in time. This 
allows for efficient retrieval of profile data.</p></li>
+  
+<li>
+<p><tt>RowKeyBuilder</tt> - Builds row keys that can be used to read or write 
profile data to HBase.</p></li>
+  
+<li>
+<p><tt>ColumnBuilder</tt> - Defines the columns of data stored with a profile 
measurement.</p></li>
+  
+<li>
+<p><tt>ProfileHBaseMapper</tt> - Defines for the <tt>HBaseBolt</tt> how 
profile measurements are stored in HBase. This class leverages a 
<tt>RowKeyBuilder</tt> and <tt>ColumnBuilder</tt>.</p></li>
+</ul></div>
+<div class="section">
+<h2><a name="Storm_Topology"></a>Storm Topology</h2>
+<p>The Profiler is implemented as a Storm topology using the following bolts 
and spouts.</p>
+
+<ul>
+  
+<li>
+<p><tt>KafkaSpout</tt> - A spout that consumes messages from a single Kafka 
topic. In most cases, the Profiler topology will consume messages from the 
<tt>indexing</tt> topic. This topic contains fully enriched messages that are 
ready to be indexed. This ensures that profiles can take advantage of all the 
available data elements.</p></li>
+  
+<li>
+<p><tt>ProfileSplitterBolt</tt> - The bolt responsible for filtering incoming 
messages and directing each to the one or more downstream bolts that are 
responsible for building a profile. Each message may be needed by 0, 1 or even 
many profiles. Each emitted tuple contains the &#x2018;resolved&#x2019; entity 
name, the profile definition, and the input message.</p></li>
+  
+<li>
+<p><tt>ProfileBuilderBolt</tt> - This bolt maintains all of the state required 
to build a profile. When the window period expires, the data is summarized as a 
<tt>ProfileMeasurement</tt>, all state is flushed, and the 
<tt>ProfileMeasurement</tt> is emitted. Each instance of this bolt is 
responsible for maintaining the state for a single Profile-Entity pair.</p></li>
+  
+<li>
+<p><tt>HBaseBolt</tt> - A bolt that is responsible for writing to HBase. Most 
profiles will be flushed every 15 minutes or so. If each 
<tt>ProfileBuilderBolt</tt> were responsible for writing to HBase itself, there 
would be little to no opportunity to optimize these writes. By aggregating the 
writes from multiple Profile-Entity pairs these writes can be batched, for 
example.</p></li>
+</ul></div>
+                  </div>
+            </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container-fluid">
+              <div class="row span12">Copyright &copy;                    2017
+                        <a href="https://www.apache.org";>The Apache Software 
Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                          
+        
+                </div>
+    </footer>
+  </body>
+</html>


Reply via email to