Re: Network Storage

2012-04-16 Thread Simon Leinen
Andrew Thrift writes:
 If you want something from a Tier1 the new Dell R720XD's will take 24x
 900GB SAS disks

or 12x 2TB 3.5 cheap  slow SATA disks
or 12x 3TB 3.5 more expensive  slightly faster SAS disks

- if you take the (cheaper) 3.5-disk variant of the R720xd chassis.

or 12x 3TB 3.5 cheapslow SATA disks if you buy them directly rather
than from Dell.  (Presumably you'd have to buy Dell hot-swap trays)
-- 
Simon.

 and have 16 cores.  If you order it with a SAS6-HBA you can add up to
 8 trays of 24 x 900GB SAS disks to provide 194TB of raw space at quite
 a reasonable cost.



RE: Network Storage

2012-04-16 Thread Drew Weaver
I'd like to point out that you can actually do 26 2.5 disks on an R720xd if 
you use the flexbay +1 SD card for your os install if you're being a 
maximalist. =)

-Drew


-Original Message-
From: Simon Leinen [mailto:simon.lei...@switch.ch] 
Sent: Monday, April 16, 2012 5:38 AM
To: Andrew Thrift
Cc: nanog@nanog.org
Subject: Re: Network Storage

Andrew Thrift writes:
 If you want something from a Tier1 the new Dell R720XD's will take 24x 
 900GB SAS disks

or 12x 2TB 3.5 cheap  slow SATA disks
or 12x 3TB 3.5 more expensive  slightly faster SAS disks

- if you take the (cheaper) 3.5-disk variant of the R720xd chassis.

or 12x 3TB 3.5 cheapslow SATA disks if you buy them directly rather than from 
Dell.  (Presumably you'd have to buy Dell hot-swap trays)
--
Simon.

 and have 16 cores.  If you order it with a SAS6-HBA you can add up to
 8 trays of 24 x 900GB SAS disks to provide 194TB of raw space at quite 
 a reasonable cost.




Re: Network Storage

2012-04-15 Thread Julien Goodwin
On 13/04/12 06:25, Maverick wrote:
 Can you please comment on what is best solution for storing network
 traffic. We have been graciously granted access by our network
 administrator to capture traffic but the one Tera byte disk space is
 no match with the data that we are seeing, so it fills up quickly. We
 can't get additional space on the server itself so I am looking for
 some external solutions. Can you please suggest something that would
 be best for Gbps speeds .

In terms of tools, something shiny that I've not had a chance to play
with yet that is designed for this is Security Onion, which is an Ubuntu
based linux distribution that groups a bunch of tools for doing this
sort of thing.

http://securityonion.blogspot.com/



Re: Network Storage

2012-04-15 Thread Leo Bicknell
In a message written on Thu, Apr 12, 2012 at 05:16:27PM -0400, Maverick wrote:
 1) My goal is to store the traffic may be fore ever, and analyze it in
 the future for security related incidents detected by ids/ips.

Let's just assume you have enough disk space that you can write out
every packet, or even just packet header.  That's a hard problem,
but you've received plenty of suggestions on how to go down that
path.

Once you have that data, how are you going to process it?

Yes, disk reads are faster than disk writes, but not by that much.
If it takes you 24 hours to write a day of data to disk, it might
take you 12 hours just to read it all back off and process it.
Processing a weeks worth of back data could take days.  I'm also
not even starting to count the CPU and memory necessary to build
state tables and statistical analysis tables to generate useful
data.

There's a reason why most network traffic tools summarize early,
as early as on the network device when using Netflow type collection.
It's not just to save storage space on disk, but it's to make the
processing of the data fast enough that it can be done in a short
enough time that the data is still relevant when the processing is
complete.

-- 
   Leo Bicknell - bickn...@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/


pgpfUOXA2ZFtU.pgp
Description: PGP signature


Re: Network Storage

2012-04-15 Thread George Herbert
On Thu, Apr 12, 2012 at 3:19 PM, Jared Mauch ja...@puck.nether.net wrote:
 You can also look at a machine like this:

 http://www.supermicro.com/products/chassis/4U/417/SC417E16-R1400U.cfm

 Jared Mauch

 On Apr 12, 2012, at 5:47 PM, Matthew Luckie m...@luckie.org.nz wrote:

 1) My goal is to store the traffic may be fore ever, and analyze it in
 the future for security related incidents detected by ids/ips.

 Take a look at Building a Time Machine for Efficient Recording and
 Retrieval of High-Volume Network Traffic

 https://www.usenix.org/conference/imc-05/building-time-machine-efficient-recording-and-retrieval-high-volume-network

Just FYI, it's somewhat of a tossup on large large arrays with 3.5
and 2.5 models.  Equivalent 3.5 units hold 36-48 HDDs, and drive
sizes for enterprise SAS drives are 3 TB in 3.5 vs 1 TB in 2.5 now,
so you get more per box with 3.5 drives.  Also a lot cheaper in the
end.

About six months ago I purchased two similar boxes for nearline
backups purposes (lower bandwidth) with 3.5 drives; 34 x 3 TB plus a
couple of much faster 2.5 15k boot drives,
post-RAID-10-and-hotspare-and-filesystem usable space was about 42 TB.
 About $22k each.  One can go somewhat cheaper than that but the VAR
had a good support story and just fixed it the next day when a RAID
card model didn't quite work out.


-- 
-george william herbert
george.herb...@gmail.com



Re: Network Storage

2012-04-15 Thread Andrew Thrift
If you want something from a Tier1 the new Dell R720XD's will take 24x 
900GB SAS disks and have 16 cores.   If you order it with a SAS6-HBA you 
can add up to 8 trays of 24 x 900GB SAS disks to provide 194TB of raw 
space at quite a reasonable cost.



Alternatively, you could have a couple of probe servers connected to 
some nice fast SAN backend with redundant controllers.   This will 
provide failover at the probe and storage levels but will cost a fair 
bit more   :)






Regards,






Andrew

On 16/04/2012 11:18 a.m., George Herbert wrote:

On Thu, Apr 12, 2012 at 3:19 PM, Jared Mauchja...@puck.nether.net  wrote:

You can also look at a machine like this:

http://www.supermicro.com/products/chassis/4U/417/SC417E16-R1400U.cfm

Jared Mauch

On Apr 12, 2012, at 5:47 PM, Matthew Luckiem...@luckie.org.nz  wrote:


1) My goal is to store the traffic may be fore ever, and analyze it in
the future for security related incidents detected by ids/ips.

Take a look at Building a Time Machine for Efficient Recording and
Retrieval of High-Volume Network Traffic

https://www.usenix.org/conference/imc-05/building-time-machine-efficient-recording-and-retrieval-high-volume-network

Just FYI, it's somewhat of a tossup on large large arrays with 3.5
and 2.5 models.  Equivalent 3.5 units hold 36-48 HDDs, and drive
sizes for enterprise SAS drives are 3 TB in 3.5 vs 1 TB in 2.5 now,
so you get more per box with 3.5 drives.  Also a lot cheaper in the
end.

About six months ago I purchased two similar boxes for nearline
backups purposes (lower bandwidth) with 3.5 drives; 34 x 3 TB plus a
couple of much faster 2.5 15k boot drives,
post-RAID-10-and-hotspare-and-filesystem usable space was about 42 TB.
  About $22k each.  One can go somewhat cheaper than that but the VAR
had a good support story and just fixed it the next day when a RAID
card model didn't quite work out.






Network Storage

2012-04-12 Thread Maverick
Hello Everyone,

Can you please comment on what is best solution for storing network
traffic. We have been graciously granted access by our network
administrator to capture traffic but the one Tera byte disk space is
no match with the data that we are seeing, so it fills up quickly. We
can't get additional space on the server itself so I am looking for
some external solutions. Can you please suggest something that would
be best for Gbps speeds .


Best,
Ali



Re: Network Storage

2012-04-12 Thread Joel jaeggli
Depends on the duration and goals of your capture...

1TB is 2.276 hours at 1Gb/s

If you need to capture it all and store it forever well sorry. If you
just need the flows and not the packets sampled netflow  can  reduce
youre requirements by many orders of magnitude, ultimately it really
depends on your goals. if you need to capture more data for a shorter
duration probably write speed rather than capcity is the issue 20Gb/s is
2.5GB/s which requires a pretty healthy disk array to write to disk...

On 4/12/12 13:25 , Maverick wrote:
 Hello Everyone,
 
 Can you please comment on what is best solution for storing network
 traffic. We have been graciously granted access by our network
 administrator to capture traffic but the one Tera byte disk space is
 no match with the data that we are seeing, so it fills up quickly. We
 can't get additional space on the server itself so I am looking for
 some external solutions. Can you please suggest something that would
 be best for Gbps speeds .
 
 
 Best,
 Ali
 




Re: Network Storage

2012-04-12 Thread Michael J McCafferty
Ali,
Do you need to capture the whole packet, including the payload? You
will save a lot of space by just capturing the headers. For example,
tcpdump doesn't capture the whole packet by default anyway. You may not
be able to capture at line rate anyway depending on what you are using
to capture with (drivers, libraries, software, etc). See the -s option
in tcpdump man page for info.

Good luck,
Mike

On Thu, 2012-04-12 at 16:25 -0400, Maverick wrote:
 Hello Everyone,
 
 Can you please comment on what is best solution for storing network
 traffic. We have been graciously granted access by our network
 administrator to capture traffic but the one Tera byte disk space is
 no match with the data that we are seeing, so it fills up quickly. We
 can't get additional space on the server itself so I am looking for
 some external solutions. Can you please suggest something that would
 be best for Gbps speeds .
 
 
 Best,
 Ali
 

-- 

Michael J. McCafferty
CEO
M5 Hosting
http://www.m5hosting.com

Like us on Facebook for updates and photos:
https://www.facebook.com/m5hosting





Re: Network Storage

2012-04-12 Thread Maverick
Thank you very much for your suggestions.

1) My goal is to store the traffic may be fore ever, and analyze it in
the future for security related incidents detected by ids/ips.

2) I am storing just header and initial few bytes but still it gets
filled up quite quickly.

3) Netflow approach is nice but I also want to have traces available
for reasons mentioned in 1).

4) Are there any issues having an external storage as a solution for
this problem.

Best,
Ali

On Thu, Apr 12, 2012 at 5:06 PM, Michael J McCafferty
m...@m5computersecurity.com wrote:
 Ali,
        Do you need to capture the whole packet, including the payload? You
 will save a lot of space by just capturing the headers. For example,
 tcpdump doesn't capture the whole packet by default anyway. You may not
 be able to capture at line rate anyway depending on what you are using
 to capture with (drivers, libraries, software, etc). See the -s option
 in tcpdump man page for info.

 Good luck,
 Mike

 On Thu, 2012-04-12 at 16:25 -0400, Maverick wrote:
 Hello Everyone,

 Can you please comment on what is best solution for storing network
 traffic. We have been graciously granted access by our network
 administrator to capture traffic but the one Tera byte disk space is
 no match with the data that we are seeing, so it fills up quickly. We
 can't get additional space on the server itself so I am looking for
 some external solutions. Can you please suggest something that would
 be best for Gbps speeds .


 Best,
 Ali


 --
 
 Michael J. McCafferty
 CEO
 M5 Hosting
 http://www.m5hosting.com

 Like us on Facebook for updates and photos:
 https://www.facebook.com/m5hosting
 




RE: Network Storage

2012-04-12 Thread Ian McDonald
Hi,
You'll need to build an array that'll random read/write upwards of 200MB/s if 
you want to get a semi-reliable capture to disk. That means SSD if you're very 
rich, or many spindles (preferably 15k's) in a stripe/ raid10 if you're 
building from your scrap pile. Bear in mind that write cache won't help you, as 
the io isn't going to be bursty, rather a continuous stream.

Another great help is scoping what you're looking for and pre-processing before 
writing out only the 'interesting' bits, thus reducing the io requirement. It 
does depend what you're trying to do, as headers can be adequate for many 
applications.

Aligning your partitions with the physical disk geometry can produce surprising 
speedups, as can stripe block size changes, but that's generally empirical, and 
depends on your workload.

--
ian
-Original Message-
From: Maverick
Sent:  12/04/2012, 21:27
To: nanog@nanog.org
Subject: Network Storage

Hello Everyone,

Can you please comment on what is best solution for storing network
traffic. We have been graciously granted access by our network
administrator to capture traffic but the one Tera byte disk space is
no match with the data that we are seeing, so it fills up quickly. We
can't get additional space on the server itself so I am looking for
some external solutions. Can you please suggest something that would
be best for Gbps speeds .


Best,
Ali




Re: Network Storage

2012-04-12 Thread John T. Yocum
In that case, just keep adding disks to you capture system, or use a NAS 
to do it.


--John

On 4/12/2012 2:16 PM, Maverick wrote:

Thank you very much for your suggestions.

1) My goal is to store the traffic may be fore ever, and analyze it in
the future for security related incidents detected by ids/ips.

2) I am storing just header and initial few bytes but still it gets
filled up quite quickly.

3) Netflow approach is nice but I also want to have traces available
for reasons mentioned in 1).

4) Are there any issues having an external storage as a solution for
this problem.

Best,
Ali

On Thu, Apr 12, 2012 at 5:06 PM, Michael J McCafferty
m...@m5computersecurity.com  wrote:

Ali,
Do you need to capture the whole packet, including the payload? You
will save a lot of space by just capturing the headers. For example,
tcpdump doesn't capture the whole packet by default anyway. You may not
be able to capture at line rate anyway depending on what you are using
to capture with (drivers, libraries, software, etc). See the -s option
in tcpdump man page for info.

Good luck,
Mike

On Thu, 2012-04-12 at 16:25 -0400, Maverick wrote:

Hello Everyone,

Can you please comment on what is best solution for storing network
traffic. We have been graciously granted access by our network
administrator to capture traffic but the one Tera byte disk space is
no match with the data that we are seeing, so it fills up quickly. We
can't get additional space on the server itself so I am looking for
some external solutions. Can you please suggest something that would
be best for Gbps speeds .


Best,
Ali



--

Michael J. McCafferty
CEO
M5 Hosting
http://www.m5hosting.com

Like us on Facebook for updates and photos:
https://www.facebook.com/m5hosting








Re: Network Storage

2012-04-12 Thread Valdis . Kletnieks
On Thu, 12 Apr 2012 14:18:30 -0700, John T. Yocum said:
 In that case, just keep adding disks to you capture system, or use a NAS
 to do it.

On Thu, 12 Apr 2012 13:43:49 -0700, Joel jaeggli said:
 1TB is 2.276 hours at 1Gb/s

If he's got a gigabit of traffic, he's going to be adding another shelf of 12 1T
drives to that NAS - every day.  If he gets the high-density shelves with 60 
drives,
he's only adding one a week.

He's going to have to work smarter, not harder.


pgpNiSvhn4Ofi.pgp
Description: PGP signature


Re: Network Storage

2012-04-12 Thread John T. Yocum



On 4/12/2012 2:34 PM, valdis.kletni...@vt.edu wrote:

On Thu, 12 Apr 2012 14:18:30 -0700, John T. Yocum said:

In that case, just keep adding disks to you capture system, or use a NAS
to do it.


On Thu, 12 Apr 2012 13:43:49 -0700, Joel jaeggli said:

1TB is 2.276 hours at 1Gb/s


If he's got a gigabit of traffic, he's going to be adding another shelf of 12 1T
drives to that NAS - every day.  If he gets the high-density shelves with 60 
drives,
he's only adding one a week.

He's going to have to work smarter, not harder.


He did indicate he's only storing the headers and a few bytes, not the 
full payload.


--John



Re: Network Storage

2012-04-12 Thread Dan Olson
If this is just for post analysis and you have another system (IDS) to identify 
the timeframe, 
a tape based system might be a better approach, esp if you want to retain 
forever.
Maybe Library LTFS


- Original Message -
From: John T. Yocum john.yo...@fluidhosting.com
To: Valdis Kletnieks valdis.kletni...@vt.edu
Cc: nanog@nanog.org
Sent: Thursday, April 12, 2012 5:37:38 PM
Subject: Re: Network Storage



On 4/12/2012 2:34 PM, valdis.kletni...@vt.edu wrote:
 On Thu, 12 Apr 2012 14:18:30 -0700, John T. Yocum said:
 In that case, just keep adding disks to you capture system, or use a NAS
 to do it.

 On Thu, 12 Apr 2012 13:43:49 -0700, Joel jaeggli said:
 1TB is 2.276 hours at 1Gb/s

 If he's got a gigabit of traffic, he's going to be adding another shelf of 12 
 1T
 drives to that NAS - every day.  If he gets the high-density shelves with 60 
 drives,
 he's only adding one a week.

 He's going to have to work smarter, not harder.

He did indicate he's only storing the headers and a few bytes, not the 
full payload.

--John




Re: Network Storage

2012-04-12 Thread Matthew Luckie
 1) My goal is to store the traffic may be fore ever, and analyze it in
 the future for security related incidents detected by ids/ips.

Take a look at Building a Time Machine for Efficient Recording and
Retrieval of High-Volume Network Traffic

https://www.usenix.org/conference/imc-05/building-time-machine-efficient-recording-and-retrieval-high-volume-network



Re: Network Storage

2012-04-12 Thread Joel M Snyder

Can you please comment on what is best solution for storing network
traffic.

Well, best is kind of a hard word to use here.  There are lots of 
different solutions depending on exactly why and where you want to 
capture this.


As far as I know, there are really two credible companies who are 
thrashing it out in this space right now, NetWitness (now part of RSA) 
and Solera.  I think that Niksun is still out there, but they haven't 
done much recently or maybe they just concentrate on particular sectors 
and so I never see them.


Of course, you can also just tcpdump it yourself, but the commercial 
products do a lot of the metadata analysis and creation for you, so it's 
a lot easier to understand what is happening in your traffic than just 
having piles of tcpdumps.


I bought a NetWitness box and was profoundly unimpressed.  So I guess my 
advice would be to start with Solera and then look at NetWitness if you 
don't like Solera.


This assumes you have budget.  If this is a back-of-the-envelope hey, 
let's grab some packets and do something with them kind of exercise, 
then filter your tcpdumps a lot better.


jms

--
Joel M Snyder, 1404 East Lind Road, Tucson, AZ, 85719
Senior Partner, Opus One   Phone: +1 520 324 0494
j...@opus1.comhttp://www.opus1.com/jms



Re: Network Storage

2012-04-12 Thread Michael J McCafferty

more in-line...

On Thu, 2012-04-12 at 17:16 -0400, Maverick wrote:
 Thank you very much for your suggestions.
 
 1) My goal is to store the traffic may be fore ever, and analyze it in
 the future for security related incidents detected by ids/ips.
 

The poor man's way to do this is to use the space you have and use the
-C and -W options in tcpdump. You have as much history as you have disk
space. Maybe make 500M files, and a count of 1800 to use 900G of disk
space. When you have an event, you copy off the files that are relevant
to the time period of the events, to a workstation. Another option is -G
for rotating the files by time instead of size.

 2) I am storing just header and initial few bytes but still it gets
 filled up quite quickly.

You can use the -z option to gzip compress the files to save space.
However, I don't know how this will affect your disk io... will it be
fast enough to keep up with the writing of the raw data and doing a
concurrent gzip of the last file. If you have enough hardware
performance, but are limited on space, then it's worth a shot.

 
 3) Netflow approach is nice but I also want to have traces available
 for reasons mentioned in 1).
 
 4) Are there any issues having an external storage as a solution for
 this problem.

There is also some advice in the man page for tcpdump regarding the -z
option. You can write a shell script that takes the capture file as the
only argument, to do other stuff you want done... in this case, copy the
file off to another drive. It could be a network location too... of
course, don't forget to not capture *that* traffic (feedback!).

 
 Best,
 Ali
 
 On Thu, Apr 12, 2012 at 5:06 PM, Michael J McCafferty
 m...@m5computersecurity.com wrote:
  Ali,
 Do you need to capture the whole packet, including the payload? You
  will save a lot of space by just capturing the headers. For example,
  tcpdump doesn't capture the whole packet by default anyway. You may not
  be able to capture at line rate anyway depending on what you are using
  to capture with (drivers, libraries, software, etc). See the -s option
  in tcpdump man page for info.
 
  Good luck,
  Mike
 
  On Thu, 2012-04-12 at 16:25 -0400, Maverick wrote:
  Hello Everyone,
 
  Can you please comment on what is best solution for storing network
  traffic. We have been graciously granted access by our network
  administrator to capture traffic but the one Tera byte disk space is
  no match with the data that we are seeing, so it fills up quickly. We
  can't get additional space on the server itself so I am looking for
  some external solutions. Can you please suggest something that would
  be best for Gbps speeds .
 
 
  Best,
  Ali
 
 
  --
  
  Michael J. McCafferty
  CEO
  M5 Hosting
  http://www.m5hosting.com
 
  Like us on Facebook for updates and photos:
  https://www.facebook.com/m5hosting
  
 

-- 

Michael J. McCafferty
CEO
M5 Hosting
http://www.m5hosting.com

Like us on Facebook for updates and photos:
https://www.facebook.com/m5hosting





Re: Network Storage

2012-04-12 Thread Nathan Stratton

On Thu, 12 Apr 2012, Maverick wrote:


Hello Everyone,

Can you please comment on what is best solution for storing network
traffic. We have been graciously granted access by our network
administrator to capture traffic but the one Tera byte disk space is
no match with the data that we are seeing, so it fills up quickly. We
can't get additional space on the server itself so I am looking for
some external solutions. Can you please suggest something that would
be best for Gbps speeds .


I have done this two ways in the past, first is the simple way, LSI raid 
card with lots of disks and some nice 10 gig capture cards. The 2nd way is 
to use Gluster, over a large number of hosts with infiniband connecting 
them together.





Nathan Stratton
nathan at robotics.net
http://www.robotics.net



Re: Network Storage

2012-04-12 Thread Jared Mauch
You can also look at a machine like this:

http://www.supermicro.com/products/chassis/4U/417/SC417E16-R1400U.cfm

Jared Mauch

On Apr 12, 2012, at 5:47 PM, Matthew Luckie m...@luckie.org.nz wrote:

 1) My goal is to store the traffic may be fore ever, and analyze it in
 the future for security related incidents detected by ids/ips.
 
 Take a look at Building a Time Machine for Efficient Recording and
 Retrieval of High-Volume Network Traffic
 
 https://www.usenix.org/conference/imc-05/building-time-machine-efficient-recording-and-retrieval-high-volume-network



Re: Network Storage

2012-04-12 Thread Jimmy Hess
On Thu, Apr 12, 2012 at 4:18 PM, Ian McDonald i...@st-andrews.ac.uk wrote:
 You'll need to build an array that'll random read/write upwards of 200MB/s if 
 you
 want to get a semi-reliable capture to disk. That means SSD if you're very 
 rich, or many spindles

Hey,  Saving packet captures to file is a ~98% asynchronous write,  2%
read;   ~95% sequential activity. And maybe you think about applying
some variant of header compression to the packets during capture,  to
trade a little CPU and increased RAM requirements for storage
efficiency.

The format used by PCAP and saving raw packet header bits directly to
disk is not necessarily among the most I/O or space efficient on disk
storage formats to pick.


Random writes should only occur if you are saving your captures to a
fragmented file system,  which is not recommended;  avoiding
fragmentation is important.Random reads aren't involved for
archiving data, only for analyzing it.

Do you make random reads into your saved capture files?Possibly
you're more likely to be doing a sequential scan,  even  during
analysis;   random reads  imply you have already indexed a dataset and
you are seeking  a smaller number of specific records,  to collect
information about them.

Read requirements are totally dependent on your analysis workload,
e.g. Table scan vs Index search.   Depending on what the analysis is,
it may make sense to even make extra filtered copies of the data,
using more disk space, in order to avoid a random access pattern.

If you are building a database of analysis results from raw data,  you
can and use a separate random IO optimized disk subsystem for  the
stats  database.


If  you really need approximately 200 MB/s with some random read
performance for analysis,  you should probably be looking at  building
a RAID50  with several 4-drive sets   and  1gb+ of writeback cache.

RAID10 makes more sense in situations where write requirements are not
sequential, when external storage is actually shared with multiple
applications,  or when there is a requirement for a disk drive failure
to be truly transparent,  but there is a huge capacity sacrifice   in
choosing mirroring over parity.


There is a  Time vs Cost tradeoff with regards to the analysis of the data.

When your 'analysis tools'  start reading data,  the reads increase
the disk access time,  and therefore reduce write performance;
therefore the reads should be throttled,  the higher the capacity the
disk subsystem,  the higher the cost.


Performing your analysis ahead of time via pre-caching,  or at least
indexing newly captured data in small chunks on a continuous basis may
be useful,  to minimize the amount of searching of the raw dataset
later.A small SSD or separate mirrored drive pair for that
function,   would avoid adding load to  the  raw capture storage
disk system,  if your analysis requirements are amenable to  that
pattern.

Modern OSes cache some recent filesystem data in RAM. So if the
server capturing data has sufficient SDRAM,   analyzing data while
it's still hot in the page cache,  and  saving that analysis in an
efficient index for later use,  can be useful.

(preferably 15k's) in a stripe/ raid10 if you're building from your scrap 
pile. Bear in mind that write cache won't help you, as the io isn't going to 
be bursty, rather a continuous stream.

Not really...   A good read cache is more important for the analysis,
but  Efficient write cache on your array and OS page cache is still
highly beneficial,   especially  because it can ensure that your RAID
subsystem is performing full stripe writes, for maximal efficiency of
sequential write activity,  and it can delay the media write until the
optimal moment based on platter position,  and sequence the read/write
requests;

as long as the performance of the storage system behind the cache  is
such that the storage system can on average successfully drain the
cache at a faster rate than you can fill it with data a sufficient
amount of the time,  the write cache serves an important function.


  Your I/O may be a continuous stream,  but there are most certainly
variations and spikes in the rate of packets and the performance of
mechanical disk drives.


 Aligning your partitions with the physical disk geometry can produce 
 surprising speedups, as can stripe block size changes, but that's generally 
 empirical, and depends on your workload.


For RAID systems partitions should absolutely be aligned if the OS
install defaults don't align them correctly;  on a modern OS, the
defaults are normally OK. Having an unaligned or improperly aligned
partition is just a misconfiguration;  A  track crossing for every
other sector read is an easy way of doubling the size of  small I/Os.

You won't notice with this particular use case when you are writing
large blocks, you're writing a 100Mb  chunk, asynchronously,  you
won't notice a 63kB  difference, it's less than   .0001%  of your
transfer size;this is primarily a