Re: Managing very large files

2007-10-05 Thread Jorn Argelo

Steve Bertrand wrote:

man 1 split

(esp. -l)
  

That's probably the best option for a one-shot deal like this.  On the
other hand, Perl itself provides the ability to go through a file one
line at a time, so you could just read a line, operate, write a line (to
a new file) as needed, over and over, until you get through the whole
file.

The real problem would be reading the whole file into a variable (or even
multiple variables) at once.



This is what I am afraid of. Just out of curiosity, if I did try to read
the entire file into a Perl variable all at once, would the box panic,
or as the saying goes 'what could possibly go wrong'?

Steve
  


Check out Tie::File on CPAN. This Perl module treats every line in a 
file as an array element, and the array element is loaded into memory 
when it's being requested. In other words: This will work great with 
huge files such as these, as not the entire file is loaded into memory 
at once.


http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm

Jorn


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

  


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-05 Thread Steve Bertrand
 The reason for the massive file size was my haste in running out of the
 office on Friday and forgetting to kill the tcpdump process before the
 weekend began.
 
 Sounds like you may want a Perl script to automate managing your
 tcpdumps.

99% of the time I use tcpdump for less than one minute to verify the
presence or lack thereof of ingress/egress traffic on a box or network.

This was the one time that I actually left the shell to continuously let
it capture.

I will next time however wrap it with something to stop this from
happening, or simply use the functions within the program itself:

-c Exit after receiving count packets.

Steve

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-05 Thread Steve Bertrand
 Check out Tie::File on CPAN. This Perl module treats every line in a
 file as an array element, and the array element is loaded into memory
 when it's being requested. In other words: This will work great with
 huge files such as these, as not the entire file is loaded into memory
 at once.
 
 http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm

Thanks everyone who replied to me regarding this issue.

The above appears to be my best approach.

Although I have not the time yet to look into Tie::Find (and I've never
used that module before) but I will.

So long as I can read chunks of the file, load the data into variables
(I like the array approach above) and process each array independently
without loading all of them at once into memory, and without having to
load the entire file into memory.

Tks!

Steve
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-05 Thread Bart Silverstrim

Steve Bertrand wrote:

Heiko Wundram (Beenic) wrote:

Am Donnerstag 04 Oktober 2007 22:16:29 schrieb Steve Bertrand:

This is what I am afraid of. Just out of curiosity, if I did try to read
the entire file into a Perl variable all at once, would the box panic,
or as the saying goes 'what could possibly go wrong'?
Perl most certainly wouldn't make the box panic (at least I hope so :-)), but 
would barf and quit at some point in time when it can't allocate any more 
memory (because all memory is in use). Meanwhile, your swap would've filled 
up completely, and your box would've become totally unresponsive, which goes 
away instantly the second Perl is dead/quits.


Try it. ;-) (at your own risk)


LOL, on a production box?...nope.

Hence why I asked here, probing if someone has made this mistake before
I do ;)


Isn't that what VMWare is for? ;-)
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Steve Bertrand
Heiko Wundram (Beenic) wrote:
 Am Donnerstag 04 Oktober 2007 14:43:31 schrieb Steve Bertrand:
 Is there any way to accomplish this, preferably with the ability to
 incrementally name each newly created file?
 
 man 1 split

Thanks.

Sheesh it really was that easy.

*puts head in sand*

Steve
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Heiko Wundram (Beenic)
Am Donnerstag 04 Oktober 2007 14:43:31 schrieb Steve Bertrand:
 Is there any way to accomplish this, preferably with the ability to
 incrementally name each newly created file?

man 1 split

(esp. -l)

-- 
Heiko Wundram
Product  Application Development
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Managing very large files

2007-10-04 Thread Steve Bertrand
Hi all,

I've got a 28GB tcpdump capture file that I need to (hopefully) break
down into a series of 100,000k lines or so, hopefully without the need
of reading the entire file all at once.

I need to run a few Perl processes on the data in the file, but AFAICT,
doing so on the entire original file is asking for trouble.

Is there any way to accomplish this, preferably with the ability to
incrementally name each newly created file?

TIA,

Steve
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Giorgos Keramidas
On 2007-10-04 08:43, Steve Bertrand [EMAIL PROTECTED] wrote:
 Hi all,
 I've got a 28GB tcpdump capture file that I need to (hopefully) break
 down into a series of 100,000k lines or so, hopefully without the need
 of reading the entire file all at once.
 
 I need to run a few Perl processes on the data in the file, but AFAICT,
 doing so on the entire original file is asking for trouble.
 
 Is there any way to accomplish this, preferably with the ability to
 incrementally name each newly created file?

Depending on whether you want to capture only specific parts of the dump
in the 'split output', you may have luck with something like:

tcpdump -r input.pcap -w output.pcap 'filter rules here'

This will read the file sequentially, which can be slower than having it
all in memory, but with a huge file like this it is probably a good idea :)

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Chad Perrin
On Thu, Oct 04, 2007 at 02:58:22PM +0200, Heiko Wundram (Beenic) wrote:
 Am Donnerstag 04 Oktober 2007 14:43:31 schrieb Steve Bertrand:
  Is there any way to accomplish this, preferably with the ability to
  incrementally name each newly created file?
 
 man 1 split
 
 (esp. -l)

That's probably the best option for a one-shot deal like this.  On the
other hand, Perl itself provides the ability to go through a file one
line at a time, so you could just read a line, operate, write a line (to
a new file) as needed, over and over, until you get through the whole
file.

The real problem would be reading the whole file into a variable (or even
multiple variables) at once.

-- 
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
Isaac Asimov: Part of the inhumanity of the computer is that, once it is
completely programmed and working smoothly, it is completely honest.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Steve Bertrand
 man 1 split

 (esp. -l)
 
 That's probably the best option for a one-shot deal like this.  On the
 other hand, Perl itself provides the ability to go through a file one
 line at a time, so you could just read a line, operate, write a line (to
 a new file) as needed, over and over, until you get through the whole
 file.
 
 The real problem would be reading the whole file into a variable (or even
 multiple variables) at once.

This is what I am afraid of. Just out of curiosity, if I did try to read
the entire file into a Perl variable all at once, would the box panic,
or as the saying goes 'what could possibly go wrong'?

Steve

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Heiko Wundram (Beenic)
Am Donnerstag 04 Oktober 2007 22:16:29 schrieb Steve Bertrand:
 This is what I am afraid of. Just out of curiosity, if I did try to read
 the entire file into a Perl variable all at once, would the box panic,
 or as the saying goes 'what could possibly go wrong'?

Perl most certainly wouldn't make the box panic (at least I hope so :-)), but 
would barf and quit at some point in time when it can't allocate any more 
memory (because all memory is in use). Meanwhile, your swap would've filled 
up completely, and your box would've become totally unresponsive, which goes 
away instantly the second Perl is dead/quits.

Try it. ;-) (at your own risk)

-- 
Heiko Wundram
Product  Application Development
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Chad Perrin
On Thu, Oct 04, 2007 at 04:25:18PM -0400, Steve Bertrand wrote:
 Heiko Wundram (Beenic) wrote:
  Am Donnerstag 04 Oktober 2007 22:16:29 schrieb Steve Bertrand:
  This is what I am afraid of. Just out of curiosity, if I did try to read
  the entire file into a Perl variable all at once, would the box panic,
  or as the saying goes 'what could possibly go wrong'?
  
  Perl most certainly wouldn't make the box panic (at least I hope so :-)), 
  but 
  would barf and quit at some point in time when it can't allocate any more 
  memory (because all memory is in use). Meanwhile, your swap would've filled 
  up completely, and your box would've become totally unresponsive, which 
  goes 
  away instantly the second Perl is dead/quits.
  
  Try it. ;-) (at your own risk)
 
 LOL, on a production box?...nope.
 
 Hence why I asked here, probing if someone has made this mistake before
 I do ;)
 
 The reason for the massive file size was my haste in running out of the
 office on Friday and forgetting to kill the tcpdump process before the
 weekend began.

Sounds like you may want a Perl script to automate managing your
tcpdumps.

Just a thought.

-- 
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
Kent Beck: I always knew that one day Smalltalk would replace Java.  I
just didn't know it would be called Ruby.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Chad Perrin
On Thu, Oct 04, 2007 at 04:16:29PM -0400, Steve Bertrand wrote:
  man 1 split
 
  (esp. -l)
  
  That's probably the best option for a one-shot deal like this.  On the
  other hand, Perl itself provides the ability to go through a file one
  line at a time, so you could just read a line, operate, write a line (to
  a new file) as needed, over and over, until you get through the whole
  file.
  
  The real problem would be reading the whole file into a variable (or even
  multiple variables) at once.
 
 This is what I am afraid of. Just out of curiosity, if I did try to read
 the entire file into a Perl variable all at once, would the box panic,
 or as the saying goes 'what could possibly go wrong'?

Perl will happily load stuff into RAM until you run out of RAM.  I
imagine it would then keep loading stuff into memory, and the box would
start swapping.  Eventually, you'd run out of swap space.

Perl is known to some as the Swiss Army chainsaw for a reason: it'll
cut limbs off trees about as quickly as you can put limbs in front of it.
If you put one of your own limbs in front of it (say, a leg), it'll do
exactly the same thing -- but with more bleeding and screaming.

It's kinda like Unix, that way.

-- 
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
Brian K. Reid: In computer science, we stand on each other's feet.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Mel
On Thursday 04 October 2007 22:16:29 Steve Bertrand wrote:
  man 1 split
 
  (esp. -l)
 
  That's probably the best option for a one-shot deal like this.  On the
  other hand, Perl itself provides the ability to go through a file one
  line at a time, so you could just read a line, operate, write a line (to
  a new file) as needed, over and over, until you get through the whole
  file.
 
  The real problem would be reading the whole file into a variable (or even
  multiple variables) at once.

 This is what I am afraid of. Just out of curiosity, if I did try to read
 the entire file into a Perl variable all at once, would the box panic,
 or as the saying goes 'what could possibly go wrong'?

There's probably a reason why you want to process that file - splitting it can 
be a problem if you need to keep track of some states and it splits on the 
wrong line. So, I'd probably open it in perl (or whatever processor) directly 
and use a database for storage if I really need to keep string contexts, so 
that on each line iteration my perl memory is clean.

-- 
Mel
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Steve Bertrand
Heiko Wundram (Beenic) wrote:
 Am Donnerstag 04 Oktober 2007 22:16:29 schrieb Steve Bertrand:
 This is what I am afraid of. Just out of curiosity, if I did try to read
 the entire file into a Perl variable all at once, would the box panic,
 or as the saying goes 'what could possibly go wrong'?
 
 Perl most certainly wouldn't make the box panic (at least I hope so :-)), but 
 would barf and quit at some point in time when it can't allocate any more 
 memory (because all memory is in use). Meanwhile, your swap would've filled 
 up completely, and your box would've become totally unresponsive, which goes 
 away instantly the second Perl is dead/quits.
 
 Try it. ;-) (at your own risk)

LOL, on a production box?...nope.

Hence why I asked here, probing if someone has made this mistake before
I do ;)

The reason for the massive file size was my haste in running out of the
office on Friday and forgetting to kill the tcpdump process before the
weekend began.

Steve
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Managing very large files

2007-10-04 Thread Jerry McAllister
On Wed, Oct 03, 2007 at 04:51:08PM -0600, Chad Perrin wrote:

 On Thu, Oct 04, 2007 at 04:25:18PM -0400, Steve Bertrand wrote:
  Heiko Wundram (Beenic) wrote:
   Am Donnerstag 04 Oktober 2007 22:16:29 schrieb Steve Bertrand:
   This is what I am afraid of. Just out of curiosity, if I did try to read
   the entire file into a Perl variable all at once, would the box panic,
   or as the saying goes 'what could possibly go wrong'?
   
   Perl most certainly wouldn't make the box panic (at least I hope so :-)), 
   but 
   would barf and quit at some point in time when it can't allocate any more 
   memory (because all memory is in use). Meanwhile, your swap would've 
   filled 
   up completely, and your box would've become totally unresponsive, which 
   goes 
   away instantly the second Perl is dead/quits.
   
   Try it. ;-) (at your own risk)
  
  LOL, on a production box?...nope.
  
  Hence why I asked here, probing if someone has made this mistake before
  I do ;)
  
  The reason for the massive file size was my haste in running out of the
  office on Friday and forgetting to kill the tcpdump process before the
  weekend began.
 
 Sounds like you may want a Perl script to automate managing your
 tcpdumps.
 
 Just a thought.

Yes.  
Actually, you can open that file and start reading it in Perl and
open files to write out the chunks the way you want them.  Then close
each.  Make up a name with a counter in it to create all the many 
files of chunks.  Suck off some data/statistics and accumulate info you 
want as you go.   You could even decide some of it isn't worth keeping
and cut the size of your chunks down if you don't need all of it.
But, you would have to close each of those chunk files or you would
run out of space for open files.   So, there would have to be a counter
loop to keep track of how much was written to each chunk and an open
and close for each one.

jerry

 
 -- 
 CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
 Kent Beck: I always knew that one day Smalltalk would replace Java.  I
 just didn't know it would be called Ruby.
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]