Re: [pmacct-discussion] collecting large number of netflows

2016-08-19 Thread Jentsch, Mario
Sounds like you have already have the DB server hardware.
It may be a good idea to simulate the data flows to and from your DB. Some 
scripts that insert data at different constant rates and/or intermittent as it 
comes from nfacctd normally generate the input to the DB. At the same time you 
prepare the next steps of processing with the fake data. This should reveal 
bottlenecks and give you the chance to address them before they appear in the 
live system.
E.g. using multiple Netflow collectors that write to the same tablespace may 
lock each other and decreasing insert performance. Same applies for reading the 
written data for further processing. Reducing the locks can be challenging, 
splitting the tablespace with partitioning or per collector separated inbound 
tables can help.

Good luck!
Mario

> -Original Message-
> From: pmacct-discussion [mailto:pmacct-discussion-boun...@pmacct.net]
> On Behalf Of Stephen Clark
> Sent: Thursday, August 18, 2016 2:24 PM
> To: pmacct-discussion@pmacct.net
> Subject: Re: [pmacct-discussion] collecting large number of netflows
> 
> On 08/17/2016 08:38 AM, Jentsch, Mario wrote:
> > Hey Steve,
> >
> > that question can't be answered without a lot of assumptions about the
> details of your project and we made the experience that even with project
> details it is a hard thing to predict due to the nature of network traffic
> patterns. Pmacct (namely nfacctd) can handle that number of flows - even
> with only one instance - and is most probably not the bottleneck. If it is
> possible what you plan to do, depends on questions like "how many records
> per timebin do you have after aggregation in nfacctd" - this is what your
> backend DB has to handle and "how is this data processed later on?" - this
> has more or less impact on DB performance and the time it takes to create
> reports or feed any user interfaces.
> >
> > Regards,
> > Mario
> Hi Mario,
> 
> Thanks for the response. We will be collecting data from about 200 probes.
> This
> is a new endeavor so I guess we be learning on the fly. We are planning on
> using
> fsrc sampling feature set at 20 flows per minute with inserts only into a
> postgresql 9.4 DB running on CentOS 6.8 in VMware on a hefty Cisco UCS
> system.
> 
> Regards,
> Steve
> >> -Original Message-
> >> From: pmacct-discussion [mailto:pmacct-discussion-
> boun...@pmacct.net]
> >> On Behalf Of Stephen Clark
> >> Sent: Thursday, August 04, 2016 5:01 PM
> >> To: pmacct-discussion@pmacct.net
> >> Subject: [pmacct-discussion] collecting large number of netflows
> >>
> >> Hi List,
> >>
> >> I am looking to collect a large number of netflow records, on the order of
> a
> >> 100
> >> million a day,
> >> and store them in a postgres DB. Has anyone done this or something
> similar
> >> using
> >> pmacct?
> >>
> >> Thanks,
> >> Steve
> >>
> >>
> 
> 
> ___
> pmacct-discussion mailing list
> http://www.pmacct.net/#mailinglists

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] collecting large number of netflows

2016-08-18 Thread Stephen Clark

On 08/17/2016 08:38 AM, Jentsch, Mario wrote:

Hey Steve,

that question can't be answered without a lot of assumptions about the details of your project and 
we made the experience that even with project details it is a hard thing to predict due to the 
nature of network traffic patterns. Pmacct (namely nfacctd) can handle that number of flows - even 
with only one instance - and is most probably not the bottleneck. If it is possible what you plan 
to do, depends on questions like "how many records per timebin do you have after aggregation 
in nfacctd" - this is what your backend DB has to handle and "how is this data processed 
later on?" - this has more or less impact on DB performance and the time it takes to create 
reports or feed any user interfaces.

Regards,
Mario

Hi Mario,

Thanks for the response. We will be collecting data from about 200 probes. This 
is a new endeavor so I guess we be learning on the fly. We are planning on using
fsrc sampling feature set at 20 flows per minute with inserts only into a 
postgresql 9.4 DB running on CentOS 6.8 in VMware on a hefty Cisco UCS system.


Regards,
Steve

-Original Message-
From: pmacct-discussion [mailto:pmacct-discussion-boun...@pmacct.net]
On Behalf Of Stephen Clark
Sent: Thursday, August 04, 2016 5:01 PM
To: pmacct-discussion@pmacct.net
Subject: [pmacct-discussion] collecting large number of netflows

Hi List,

I am looking to collect a large number of netflow records, on the order of a
100
million a day,
and store them in a postgres DB. Has anyone done this or something similar
using
pmacct?

Thanks,
Steve





___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] collecting large number of netflows

2016-08-04 Thread Dariush Marsh-Mossadeghi
For that sort of volume I’d investigate using and ELK stack on the backend and 
de-coupling pmacct from storage and analysis with a message queue like Rabbit. 
I’ve used this approach (not to that scale though) and it works well.

ELK == ElasticSearch Logstash Kibana

> 
> 
> On 08/04/2016 11:48 AM, David McKen wrote:
>> For that type of scale maybe a SQL like NoSQL db like cassandra may work 
>> better for you.
>> 
>> On Thu, Aug 4, 2016 at 11:01 AM, Stephen Clark > > wrote:
>> Hi List,
>> 
>> I am looking to collect a large number of netflow records, on the order of a 
>> 100 million a day,
>> and store them in a postgres DB. Has anyone done this or something similar 
>> using pmacct?
>> 
>> Thanks,
>> Steve
>> 
>> 
>> ___
>> pmacct-discussion mailing list
>> http://www.pmacct.net/#mailinglists 
>> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] collecting large number of netflows

2016-08-04 Thread Stephen Clark

Hmm...

I don't think pmacct directly support cassandra, but is does support  MongoDB.
Also I would like to be able to filter/sample the data at the point of origin, 
but
I don't think that is possible with pmacctd and the nfprobe module. It only 
seems
to be able to do it at the collector - nfacctd.

Steve

On 08/04/2016 11:48 AM, David McKen wrote:
For that type of scale maybe a SQL like NoSQL db like cassandra may work 
better for you.


On Thu, Aug 4, 2016 at 11:01 AM, Stephen Clark > wrote:


Hi List,

I am looking to collect a large number of netflow records, on the order of
a 100 million a day,
and store them in a postgres DB. Has anyone done this or something similar
using pmacct?

Thanks,
Steve


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists





--

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

[pmacct-discussion] collecting large number of netflows

2016-08-04 Thread Stephen Clark

Hi List,

I am looking to collect a large number of netflow records, on the order of a 100 
million a day,
and store them in a postgres DB. Has anyone done this or something similar using 
pmacct?


Thanks,
Steve


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists