I suppose size is relative to the amount of data you have too. 
I dont quite have 1000 hosts, but OSSEC reports 1287103.2 events per hour. 
Yes 1.2 million events per hour. 

I'm pretty certain that I've hit the OS cap on UDP packet volume at times 
during peak data sending times. This is usually due to Syscheck running 
during already busy times. 

I would recommend to ensure your ossec server(s) have a higher increased 
UDP buffer size (sysctl -A) for the OS. Thats of course on top of the 
previously mentioned agent size increasing. 

I've had to be creative for syscheck to break it up into chunks otherwise I 
potentially miss data (downside to UDP) and group into separate OSSEC 
servers (though the downside is not being able to take data very easily 
from 1 ossec server env and perform active response on another ossec 
server). I do have a feature request to allow syscheck to be performed 
during a random window so you can spread out the volume of results back to 
the server. 

Also, the agent disconnected alerts in volume are an indication of UDP 
traffic so I would recommend keeping it, but perhaps doing a threshold 
rule. 

Hope that helps.


On Friday, April 6, 2012 7:39:54 AM UTC-7, Zate wrote:
>
> That helps immensely, it's pretty much exactly what we are looking to 
> build, right down to us using splunk and puppet and about the same amount 
> of hosts in the same amount of locations.
>
> Thanks a lot.
>
> Zate
>
>
> On Sun, Apr 1, 2012 at 6:22 AM, Kat <[email protected]> wrote:
>
>> 4 installs --
>> 1700 hosts
>> 1200 hosts
>> 1340 hosts
>> and 900 (oops, that is not over 1000, but close)
>>
>> Use puppet to manage deployments rather than OSSEC itself. Also,
>> puppet maintains more than just agent.conf. Splunk on the backend with
>> "Splunk for OSSEC"  app handling all the "details".  Also, because
>> this was large mixed platform of Linux, Hp-UX, AIX, Solaris and
>> Windoze, puppet made things much easier.
>>
>> Biggest problem was the constant alerts of disconnected agents, when
>> they really weren't. This was caused mostly by the load and short
>> check times in the agent/server codes. I found some patches to bump
>> that up, but in the beginning I just disabled the "Agent disconnected"
>> rules, which also worked.
>>
>> ** Maybe a note to developers -- as the agent count goes up - set up
>> check-in timers that go up with the agent count. It would avoid a lot
>> of false-positives on these alerts.
>>
>> My biggest issue was with reporting, which is why Splunk was added to
>> the mix. This gives the flexibility needed to support both SOC type
>> engineers as well as auditors requests, and once the reports are
>> defined, they can modify them easily enough for their needs with just
>> a little training.
>>
>> Hope this helps - if you have questions, just ask and I will try to
>> answer.
>>
>> ~K
>>
>
>

Reply via email to