Re: Clearing of data to start over

Frank Horsfall Wed, 06 Sep 2017 20:20:07 -0700

Thanks Jon

I will look into the links you provided first thing  tomorrow morning.


I installed Nifi and have it running on my node1. I suspected it would be part 
of the overall solution. I suspect that as sources increase I would consider 
installing it on the other vms????


F

F



Sent from my Bell Samsung device over Canada's largest network.


-------- Original message --------
From: "[email protected]" <[email protected]>
Date: 2017-09-06 11:05 PM (GMT-05:00)
To: [email protected]
Subject: Re: Clearing of data to start over

Can you clarify what issues you're having with bro?  I would be happy to help 
get that working.

Re: kafka brokers, you can easily add or remove these after the initial install 
in Ambari.  See this for more details - 
https://community.hortonworks.com/questions/617/is-it-possible-to-add-another-kafka-broker-to-my-c.html

Adding another VM to a cluster is pretty straightforward - more details : 
https://docs.hortonworks.com/HDPDocuments/Ambari-1.6.1.0/bk_Monitoring_Hadoop_Book/content/monitor-chap2-4b_2x.html

As long as you can get the data onto the right kafka topic, you should be good 
to go.  I would suggest looking into nifi, logstash, rsyslog, etc.

Jon

On Wed, Sep 6, 2017 at 11:01 PM Frank Horsfall 
<[email protected]<mailto:[email protected]>> wrote:
I'm on a role with questions.

I'm curious to see if I can relieve processing pressure by adding a new vm.

Would you know how I would go about it?

Also
I would like to pull data from sources instead of have the sources push data to 
my site. Have you come across this scenario?

F



Sent from my Bell Samsung device over Canada's largest network.


-------- Original message --------
From: Frank Horsfall 
<[email protected]<mailto:[email protected]>>
Date: 2017-09-06 10:51 PM (GMT-05:00)
To: [email protected]<mailto:[email protected]>
Subject: Re: Clearing of data to start over

Also

Laurens you recommended to make 3 Kafka brokers but the install wizard would 
not let me.

As a result my node1 is the only broker currently.  Would this cause a 
bottleneck?

If so is there a method to install and configures the 2 additional brokers post 
initial install?

kindest regards

Frank



Sent from my Bell Samsung device over Canada's largest network.


-------- Original message --------
From: Frank Horsfall 
<[email protected]<mailto:[email protected]>>
Date: 2017-09-06 10:38 PM (GMT-05:00)
To: [email protected]<mailto:[email protected]>
Subject: Re: Clearing of data to start over

Thanks Laurens and Nick.

I want to let the queues run over night to give us some possible insights into 
heap sizes etc.

I currently have 3 vms configured each with 8 cores  500 gigs of data capacity  
and 30 gigs of memory.

Elasticsearch has been configured with 10 gigs xmx.

I've set storm worker childopts at 7 gigs for now so it takes a while to max 
out and generate heap errors.

I deleted approx 6 million events and shut off the data generating apps.

The idea is to see how much will be processed overnight.

One thing that has me puzzled is why my bro app isn't emitting events. I double 
checked my config based on what's recommended but nothing is coming through. A 
mystery. lol


Also I kept some notes during the whole process and want to share them if you 
are interested.  let me know

Frank








Sent from my Bell Samsung device over Canada's largest network.


-------- Original message --------
From: Laurens Vets <[email protected]<mailto:[email protected]>>
Date: 2017-09-06 6:17 PM (GMT-05:00)
To: [email protected]<mailto:[email protected]>
Cc: Frank Horsfall 
<[email protected]<mailto:[email protected]>>
Subject: Re: Clearing of data to start over


Hi Frank,

If you all your queues (Kafka/Storm) are empty, the following should work:

- Deleting your elasticsearch indices: curl -X DELETE 
'http://localhost:9200/snort_index_*', curl -X DELETE 
'http://localhost:9200/yaf_index_*', etc...

- Deleting your Hadoop data:

Become the hdfs user: sudo su - hdfs
Show what's been indexed in Hadoop: hdfs dfs -ls /apps/metron/indexing/indexed/
Output should show the following probably:
/apps/metron/indexing/indexed/error
/apps/metron/indexing/indexed/snort
/apps/metron/indexing/indexed/yaf
...

You can remove these with:
hdfs dfs -rmr -skipTrash /apps/metron/indexing/indexed/error/
hdfs dfs -rmr -skipTrash /apps/metron/indexing/indexed/snort/

Or the individial files with

hdfs dfs -rmr -skipTrash /apps/metron/indexing/indexed/error/FILENAME


On 2017-09-06 13:59, Frank Horsfall wrote:
Hello all,
I have installed a 3 node system using the bare metal Centos 7 guideline.

https://cwiki.apache.org/confluence/display/METRON/Metron+0.4.0+with+HDP+2.5+bare-metal+install+on+Centos+7+with+MariaDB+for+Metron+REST

It has taken me a while to have all components working properly and I left the 
yaf,bro,snort apps running so quite a lot of data has been generated.  
Currently, I have almost 18 million events identified in Kibana. 16+ million 
are yaf based, and 2+ million are snort  …. 190 events are my new squid 
telemetry,  :).   It looks like it still has a while to go before it catches up 
to current day.   I recently shutdown the apps.


My questions are:


1.       Is there a way to wipe all my data and indices clean so that I may now 
begin with a fresh dataset?

2.       Is there a way to configure yaf so that its data is meaningful ? It is 
currently  creating what looks to be test data?

3.       I have commented out the test snort rule  but it is still generating 
the odd record which looks once again looks like test data. Can this be stopped 
as well?

Kindest regards,
Frank





--

Jon

Re: Clearing of data to start over

Reply via email to