Re: Metron HBase conditional enrichment

2017-05-25 Thread Matt Foley
That sheds light.  Thanks, Nick.
--Matt

From: Nick Allen 
Reply-To: "user@metron.apache.org" 
Date: Thursday, May 25, 2017 at 3:06 PM
To: "user@metron.apache.org" 
Subject: Re: Metron HBase conditional enrichment

Each topology has its own uber-jar that is built from all of it's dependencies. 
 It's classpath is basically whatever is in the uber-jar.

That's why running with -pl against the project from which the uber jar is 
built should identify the Stellar functions available to each topology.

When running the REPL from a deployed instance of Metron it pulls in all of the 
jars deployed to /usr/metron/(version)/lib.

On May 25, 2017 3:29 PM, "Matt Foley" 
> wrote:
Nick is correct that in any given environment, only Stellar functions defined 
in jars on the current classpath will be available.

When running with the maven exec:java plugin, as below, this means only jars 
declared (or transitively required) as dependencies to the given project.

In the installed environment, however, it is strictly a matter of the 
configured classpath.  I thought (I could be wrong), that all the metron jars 
are installed together (in /usr/metron//lib/* , in CentOS 7), and that 
all those jars will be added to the classpath for all topologies.

Nick, you’re much more familiar with the install stuff than I, please clarify 
if you can how the classpath is configured for the running topologies in the 
installed env.

At any rate, it is the classpath currently being run under that determines the 
availability of Stellar functions.
Thanks,
--Matt

From: Nick Allen >
Reply-To: "user@metron.apache.org" 
>
Date: Thursday, May 25, 2017 at 12:05 PM
To: "user@metron.apache.org" 
>

Subject: Re: Metron HBase conditional enrichment

Correct me if I am wrong, Matt, but I believe that changing the project that 
you pass to the -pl switch will allow you to see exactly what Stellar functions 
would be available in each topology.  You just have to refer to whichever 
project drives the topology.

This might help answer Ali's previous question as to what functions are 
available where.  And of course, if some function is not available where it is 
needed, then it is a simply matter of changing the dependencies to make it 
available.

For example, these functions are available from the Profiler.

$ mvn exec:java \
-Dexec.mainClass="org.apache.metron.common.stellar.shell.StellarShell" \
-pl metron-analytics/metron-profiler
...
Stellar, Go!
Please note that functions are loading lazily in the background and will be 
unavailable until loaded fully.
[Stellar]>>> Functions loaded, you may refer to functions now...
%functions
ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, 
CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, 
DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, 
FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, GET, GET_FIRST, GET_LAST, HLLP_ADD, 
HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE, IS_DOMAIN, 
IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, LIST_ADD, 
MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET, MONTH, 
OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE, 
PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW, 
PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, 
STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, 
STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, 
STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, 
STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, STRING_ENTROPY, 
SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, 
TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, 
URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR


And these functions are available in the Enrichment topology.


$ mvn exec:java \
-Dexec.mainClass="org.apache.metron.common.stellar.shell.StellarShell" \
-pl metron-platform/metron-enrichment/
...
Stellar, Go!
Please note that functions are loading lazily in the background and will be 
unavailable until loaded fully.
[Stellar]>>> Functions loaded, you may refer to functions now...
[Stellar]>>> %functions
ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, 
CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, 
DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, 
ENRICHMENT_EXISTS, ENRICHMENT_GET, FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, 
GEO_GET, GET, GET_FIRST, GET_LAST, HLLP_ADD, HLLP_CARDINALITY, HLLP_INIT, 

Re: Metron HBase conditional enrichment

2017-05-25 Thread Nick Allen
Each topology has its own uber-jar that is built from all of it's
dependencies.  It's classpath is basically whatever is in the uber-jar.

That's why running with -pl against the project from which the uber jar is
built should identify the Stellar functions available to each topology.

When running the REPL from a deployed instance of Metron it pulls in all of
the jars deployed to /usr/metron/(version)/lib.

On May 25, 2017 3:29 PM, "Matt Foley"  wrote:

Nick is correct that in any given environment, only Stellar functions
defined in jars on the current classpath will be available.



When running with the maven exec:java plugin, as below, this means only
jars declared (or transitively required) as dependencies to the given
project.



In the installed environment, however, it is strictly a matter of the
configured classpath.  I thought (I could be wrong), that all the metron
jars are installed together (in /usr/metron//lib/* , in CentOS 7),
and that all those jars will be added to the classpath for all topologies.



Nick, you’re much more familiar with the install stuff than I, please
clarify if you can how the classpath is configured for the running
topologies in the installed env.



At any rate, it is the classpath currently being run under that determines
the availability of Stellar functions.

Thanks,

--Matt



*From: *Nick Allen 
*Reply-To: *"user@metron.apache.org" 
*Date: *Thursday, May 25, 2017 at 12:05 PM
*To: *"user@metron.apache.org" 

*Subject: *Re: Metron HBase conditional enrichment



Correct me if I am wrong, Matt, but I believe that changing the project
that you pass to the -pl switch will allow you to see exactly what Stellar
functions would be available in each topology.  You just have to refer to
whichever project drives the topology.



This might help answer Ali's previous question as to what functions are
available where.  And of course, if some function is not available where it
is needed, then it is a simply matter of changing the dependencies to make
it available.



For example, these functions are available from the Profiler.



$ mvn exec:java \

-Dexec.mainClass="org.apache.metron.common.stellar.shell.StellarShell" \

-pl metron-analytics/metron-profiler

...

Stellar, Go!

Please note that functions are loading lazily in the background and will be
unavailable until loaded fully.

[Stellar]>>> Functions loaded, you may refer to functions now...

%functions

ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT,
BLOOM_MERGE, CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK,
DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD,
ENDS_WITH, FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, GET, GET_FIRST, GET_LAST,
HLLP_ADD, HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE,
IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH,
LIST_ADD, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET,
MONTH, OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE,
PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW,
PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD,
STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS,
STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE,
STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS,
STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE,
STRING_ENTROPY, SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TO_DOUBLE,
TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING,
TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL,
WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR





And these functions are available in the Enrichment topology.





$ mvn exec:java \

-Dexec.mainClass="org.apache.metron.common.stellar.shell.StellarShell" \

-pl metron-platform/metron-enrichment/

...

Stellar, Go!

Please note that functions are loading lazily in the background and will be
unavailable until loaded fully.

[Stellar]>>> Functions loaded, you may refer to functions now...

[Stellar]>>> %functions

ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT,
BLOOM_MERGE, CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK,
DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD,
ENDS_WITH, ENRICHMENT_EXISTS, ENRICHMENT_GET, FILL_LEFT, FILL_RIGHT,
FILTER, FORMAT, GEO_GET, GET, GET_FIRST, GET_LAST, HLLP_ADD,
HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE, IS_DOMAIN,
IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, LIST_ADD,
MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET, MONTH,
OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE,
PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW,
PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD,
STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS,

Re: AWS deployment with 5 hosts.

2017-05-25 Thread Nick Allen
What host groups did ec2-34-210-245-155.us-west-2.compute.amazonaws.com get
assigned to?  You can look at the labels that get attached to the host in
the EC2 web interface.

I have a feeling it has something to do with the Ambari blueprint that is
used (see metron-deployment/roles/ambari_config/vars/small_cluster.yml).
You probably need to customize or define your own blueprint.

WARNING:  Please know that the automated EC2 setup is only for disposable,
short-lived, development/testing purposes.  It is not at all secure.

If you are wanting to run Metron in AWS for any period of time, a better
approach is to define your VPC, spin up your EC2 hosts, install Ambari,
then use Metron's MPack to install Metron.








On Thu, May 25, 2017 at 1:00 PM, Laurens Vets  wrote:

> Deploying the standard 10 instance setup works. However, for our current
> needs, 10 m4.xlarge instances seem overkill and we want to deploy Metron on
> only 5 hosts for now.
>
> I would think that editing metron/metron-deployment/amazon-ec2/playbook.yml
> would be enough. I changed the following:
>
> - include: tasks/create-hosts.yml host_count=1
> host_type=sensors,ambari_master,ec2,monit
> - include: tasks/create-hosts.yml host_count=4
> host_type=ambari_slave,ec2
> - include: tasks/create-hosts.yml host_count=1
> host_type=pcap_server,monit,ec2
> - include: tasks/create-hosts.yml host_count=1
> host_type=ambari_slave,enrichment,metron,ec2,zeppelin
> - include: tasks/create-hosts.yml host_count=2
> host_type=ambari_slave,search,ec2
> - include: tasks/create-hosts.yml host_count=1
> host_type=ambari_slave,web,ec2
>
> to:
>
> - include: tasks/create-hosts.yml host_count=1
> host_type=sensors,ambari_master,ec2,monit
> - include: tasks/create-hosts.yml host_count=1
> host_type=pcap_server,monit,ec2
> - include: tasks/create-hosts.yml host_count=1
> host_type=ambari_slave,enrichment,metron,ec2,zeppelin
> - include: tasks/create-hosts.yml host_count=2
> host_type=ambari_slave,ec2
> - include: tasks/create-hosts.yml host_count=1
> host_type=ambari_slave,web,search,ec2
>
> But now deployment fails with:
>
> TASK [ambari_config : Install python-requests]
> *
> ok: [ec2-34-211-7-200.us-west-2.compute.amazonaws.com] => {"attempts": 1,
> "changed": false, "msg": "", "rc": 0, "results":
> ["python-requests-2.6.0-3.el6.noarch providing python-requests is already
> installed"]}
>
> TASK [ambari_config : check if ambari-server is up on
> ec2-34-211-7-200.us-west-2.compute.amazonaws.com:8080] ***
> ok: [ec2-34-211-7-200.us-west-2.compute.amazonaws.com] => {"changed":
> false, "elapsed": 120, "path": null, "port": 8080, "search_regex": null,
> "state": "started"}
>
> TASK [ambari_config : Deploy cluster with Ambari;
> http://ec2-34-211-7-200.us-west-2.compute.amazonaws.com:8080] ***
> fatal: [ec2-34-211-7-200.us-west-2.compute.amazonaws.com]: FAILED! =>
> {"changed": false, "failed": true, "msg": "Ambari client exception
> occurred: Could not create cluster: request code 400,
>  request message {\n  \"status\" : 400,\n  \"message\" : \"Topology
> validation failed: org.apache.ambari.server.topology.InvalidTopologyException:
> The following hosts are mapped to multiple host groups: [
> ec2-34-210-245-155.us-west-2.compute.amazonaws.com]. Be aware that host
> names are converted to lowercase, case differences do not matter in Ambari
> deployments.\"\n}"}
> to retry, use: --limit @/root/metron/metron-deploymen
> t/amazon-ec2/playbook.retry
>
> PLAY RECAP 
> *
> ec2-34-210-137-42.us-west-2.compute.amazonaws.com : ok=41   changed=27
>  unreachable=0failed=0
> ec2-34-210-245-155.us-west-2.compute.amazonaws.com : ok=47   changed=31
>  unreachable=0failed=0
> ec2-34-211-3-80.us-west-2.compute.amazonaws.com : ok=17   changed=8
> unreachable=0failed=0
> ec2-34-211-7-200.us-west-2.compute.amazonaws.com : ok=48   changed=28
>  unreachable=0failed=1
> ec2-35-165-165-255.us-west-2.compute.amazonaws.com : ok=41   changed=27
>  unreachable=0failed=0
> ec2-54-70-66-181.us-west-2.compute.amazonaws.com : ok=41   changed=27
>  unreachable=0failed=0
> localhost  : ok=18   changed=13   unreachable=0failed=0
>
> Any idea what might be going on? Did I miss a setting somewhere else?
>


AWS deployment with 5 hosts.

2017-05-25 Thread Laurens Vets
Deploying the standard 10 instance setup works. However, for our current 
needs, 10 m4.xlarge instances seem overkill and we want to deploy Metron 
on only 5 hosts for now.


I would think that editing 
metron/metron-deployment/amazon-ec2/playbook.yml would be enough. I 
changed the following:


- include: tasks/create-hosts.yml host_count=1 
host_type=sensors,ambari_master,ec2,monit
- include: tasks/create-hosts.yml host_count=4 
host_type=ambari_slave,ec2
- include: tasks/create-hosts.yml host_count=1 
host_type=pcap_server,monit,ec2
- include: tasks/create-hosts.yml host_count=1 
host_type=ambari_slave,enrichment,metron,ec2,zeppelin
- include: tasks/create-hosts.yml host_count=2 
host_type=ambari_slave,search,ec2
- include: tasks/create-hosts.yml host_count=1 
host_type=ambari_slave,web,ec2


to:

- include: tasks/create-hosts.yml host_count=1 
host_type=sensors,ambari_master,ec2,monit
- include: tasks/create-hosts.yml host_count=1 
host_type=pcap_server,monit,ec2
- include: tasks/create-hosts.yml host_count=1 
host_type=ambari_slave,enrichment,metron,ec2,zeppelin
- include: tasks/create-hosts.yml host_count=2 
host_type=ambari_slave,ec2
- include: tasks/create-hosts.yml host_count=1 
host_type=ambari_slave,web,search,ec2


But now deployment fails with:

TASK [ambari_config : Install python-requests] 
*
ok: [ec2-34-211-7-200.us-west-2.compute.amazonaws.com] => {"attempts": 
1, "changed": false, "msg": "", "rc": 0, "results": 
["python-requests-2.6.0-3.el6.noarch providing python-requests is 
already installed"]}


TASK [ambari_config : check if ambari-server is up on 
ec2-34-211-7-200.us-west-2.compute.amazonaws.com:8080] ***
ok: [ec2-34-211-7-200.us-west-2.compute.amazonaws.com] => {"changed": 
false, "elapsed": 120, "path": null, "port": 8080, "search_regex": null, 
"state": "started"}


TASK [ambari_config : Deploy cluster with Ambari; 
http://ec2-34-211-7-200.us-west-2.compute.amazonaws.com:8080] ***
fatal: [ec2-34-211-7-200.us-west-2.compute.amazonaws.com]: FAILED! => 
{"changed": false, "failed": true, "msg": "Ambari client exception 
occurred: Could not create cluster: request code 400,
 request message {\n  \"status\" : 400,\n  \"message\" : \"Topology 
validation failed: 
org.apache.ambari.server.topology.InvalidTopologyException: The 
following hosts are mapped to multiple host groups: 
[ec2-34-210-245-155.us-west-2.compute.amazonaws.com]. Be aware that host 
names are converted to lowercase, case differences do not matter in 
Ambari deployments.\"\n}"}
to retry, use: --limit 
@/root/metron/metron-deployment/amazon-ec2/playbook.retry


PLAY RECAP 
*
ec2-34-210-137-42.us-west-2.compute.amazonaws.com : ok=41   changed=27   
unreachable=0failed=0
ec2-34-210-245-155.us-west-2.compute.amazonaws.com : ok=47   changed=31  
 unreachable=0failed=0
ec2-34-211-3-80.us-west-2.compute.amazonaws.com : ok=17   changed=8
unreachable=0failed=0
ec2-34-211-7-200.us-west-2.compute.amazonaws.com : ok=48   changed=28   
unreachable=0failed=1
ec2-35-165-165-255.us-west-2.compute.amazonaws.com : ok=41   changed=27  
 unreachable=0failed=0
ec2-54-70-66-181.us-west-2.compute.amazonaws.com : ok=41   changed=27   
unreachable=0failed=0
localhost  : ok=18   changed=13   unreachable=0
failed=0


Any idea what might be going on? Did I miss a setting somewhere else?


Re: Metron HBase conditional enrichment

2017-05-25 Thread Otto Fowler
I think most of those restricted functions are in the metron-managment
section.


On May 25, 2017 at 07:27:24, Nick Allen (n...@nickallen.org) wrote:

> everywhere I can use Stellar DSL, all of the functions have been
implemented and ready to use?

Generally, yes, you are right.

I vaguely remember a couple instances of functions that are useful in the
REPL only, but I cannot remember what those are right now.  Hopefully we
have those doc'd appropriately.



On Wed, May 24, 2017 at 10:38 PM, Ali Nazemian 
wrote:

> Hi Nick,
>
> I was not sure about the implementation, so does it generally mean
> everywhere I can use Stellar DSL, all of the functions have been
> implemented and ready to use?
>
> Cheers,
> Ali
>
> On Thu, May 25, 2017 at 2:52 AM, Nick Allen  wrote:
>
>> > can I do the concatenation on the fly at the enrichment level, so I
>> don't need to store this temp field in Elasticsearch/HDFS.
>>
>> Sure, absolutely.
>>
>> > Moreover, I need to have a conditional enrichment to say if you
>> couldn't find any match for "tenant_name+device_type+device_name" lookup
>> for "tenant_name+device_type+default_device".
>>
>> Yes, you can.  You've got if/else, JOIN, IS_EMPTY, and others that should
>> make implementing this logic pretty easy.
>>
>>
>>
>>
>> On Tue, May 23, 2017 at 10:34 PM, Ali Nazemian 
>> wrote:
>>
>>> Hi,
>>>
>>> I was wondering how I can manage Stellar syntax to be aligned with the
>>> following structure for the HBase enrichment:
>>>
>>> HBase_row_key: tenant_name+device_type+device_name
>>>
>>> At the high-level,  I need to create a separate field via a post-parse
>>> Stellar function to be a concatenation of tenan_name, device_type and
>>> device_name. Let's call this field "key". Basically, I need to do the
>>> enrichment on the "key" which would be corresponding to the HBase row key.
>>> My first question is *can I do the concatenation on the fly at the
>>> enrichment level, so I don't need to store this temp field in
>>> Elasticsearch/HDFS*.
>>>
>>> Moreover, I need to have a conditional enrichment to say if you couldn't
>>> find any match for "tenant_name+device_type+device_name" lookup for
>>> "tenant_name+device_type+default_device". The second question would be *how
>>> can I manage conditional enrichment like this one*. I would be really
>>> grateful if you can provide some example.
>>>
>>> Regards,
>>> Ali
>>>
>>
>>
>
>
> --
> A.Nazemian
>