Re: Managing very large files
Steve Bertrand wrote: man 1 split (esp. -l) That's probably the best option for a one-shot deal like this. On the other hand, Perl itself provides the ability to go through a file one line at a time, so you could just read a line, operate, write a line (to a new file) as needed, over and over, until you get through the whole file. The real problem would be reading the whole file into a variable (or even multiple variables) at once. This is what I am afraid of. Just out of curiosity, if I did try to read the entire file into a Perl variable all at once, would the box panic, or as the saying goes 'what could possibly go wrong'? Steve Check out Tie::File on CPAN. This Perl module treats every line in a file as an array element, and the array element is loaded into memory when it's being requested. In other words: This will work great with huge files such as these, as not the entire file is loaded into memory at once. http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm Jorn ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
The reason for the massive file size was my haste in running out of the office on Friday and forgetting to kill the tcpdump process before the weekend began. Sounds like you may want a Perl script to automate managing your tcpdumps. 99% of the time I use tcpdump for less than one minute to verify the presence or lack thereof of ingress/egress traffic on a box or network. This was the one time that I actually left the shell to continuously let it capture. I will next time however wrap it with something to stop this from happening, or simply use the functions within the program itself: -c Exit after receiving count packets. Steve ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
Check out Tie::File on CPAN. This Perl module treats every line in a file as an array element, and the array element is loaded into memory when it's being requested. In other words: This will work great with huge files such as these, as not the entire file is loaded into memory at once. http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm Thanks everyone who replied to me regarding this issue. The above appears to be my best approach. Although I have not the time yet to look into Tie::Find (and I've never used that module before) but I will. So long as I can read chunks of the file, load the data into variables (I like the array approach above) and process each array independently without loading all of them at once into memory, and without having to load the entire file into memory. Tks! Steve ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
Steve Bertrand wrote: Heiko Wundram (Beenic) wrote: Am Donnerstag 04 Oktober 2007 22:16:29 schrieb Steve Bertrand: This is what I am afraid of. Just out of curiosity, if I did try to read the entire file into a Perl variable all at once, would the box panic, or as the saying goes 'what could possibly go wrong'? Perl most certainly wouldn't make the box panic (at least I hope so :-)), but would barf and quit at some point in time when it can't allocate any more memory (because all memory is in use). Meanwhile, your swap would've filled up completely, and your box would've become totally unresponsive, which goes away instantly the second Perl is dead/quits. Try it. ;-) (at your own risk) LOL, on a production box?...nope. Hence why I asked here, probing if someone has made this mistake before I do ;) Isn't that what VMWare is for? ;-) ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
Heiko Wundram (Beenic) wrote: Am Donnerstag 04 Oktober 2007 14:43:31 schrieb Steve Bertrand: Is there any way to accomplish this, preferably with the ability to incrementally name each newly created file? man 1 split Thanks. Sheesh it really was that easy. *puts head in sand* Steve ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
Am Donnerstag 04 Oktober 2007 14:43:31 schrieb Steve Bertrand: Is there any way to accomplish this, preferably with the ability to incrementally name each newly created file? man 1 split (esp. -l) -- Heiko Wundram Product Application Development ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Managing very large files
Hi all, I've got a 28GB tcpdump capture file that I need to (hopefully) break down into a series of 100,000k lines or so, hopefully without the need of reading the entire file all at once. I need to run a few Perl processes on the data in the file, but AFAICT, doing so on the entire original file is asking for trouble. Is there any way to accomplish this, preferably with the ability to incrementally name each newly created file? TIA, Steve ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
On 2007-10-04 08:43, Steve Bertrand [EMAIL PROTECTED] wrote: Hi all, I've got a 28GB tcpdump capture file that I need to (hopefully) break down into a series of 100,000k lines or so, hopefully without the need of reading the entire file all at once. I need to run a few Perl processes on the data in the file, but AFAICT, doing so on the entire original file is asking for trouble. Is there any way to accomplish this, preferably with the ability to incrementally name each newly created file? Depending on whether you want to capture only specific parts of the dump in the 'split output', you may have luck with something like: tcpdump -r input.pcap -w output.pcap 'filter rules here' This will read the file sequentially, which can be slower than having it all in memory, but with a huge file like this it is probably a good idea :) ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
On Thu, Oct 04, 2007 at 02:58:22PM +0200, Heiko Wundram (Beenic) wrote: Am Donnerstag 04 Oktober 2007 14:43:31 schrieb Steve Bertrand: Is there any way to accomplish this, preferably with the ability to incrementally name each newly created file? man 1 split (esp. -l) That's probably the best option for a one-shot deal like this. On the other hand, Perl itself provides the ability to go through a file one line at a time, so you could just read a line, operate, write a line (to a new file) as needed, over and over, until you get through the whole file. The real problem would be reading the whole file into a variable (or even multiple variables) at once. -- CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ] Isaac Asimov: Part of the inhumanity of the computer is that, once it is completely programmed and working smoothly, it is completely honest. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
man 1 split (esp. -l) That's probably the best option for a one-shot deal like this. On the other hand, Perl itself provides the ability to go through a file one line at a time, so you could just read a line, operate, write a line (to a new file) as needed, over and over, until you get through the whole file. The real problem would be reading the whole file into a variable (or even multiple variables) at once. This is what I am afraid of. Just out of curiosity, if I did try to read the entire file into a Perl variable all at once, would the box panic, or as the saying goes 'what could possibly go wrong'? Steve ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
Am Donnerstag 04 Oktober 2007 22:16:29 schrieb Steve Bertrand: This is what I am afraid of. Just out of curiosity, if I did try to read the entire file into a Perl variable all at once, would the box panic, or as the saying goes 'what could possibly go wrong'? Perl most certainly wouldn't make the box panic (at least I hope so :-)), but would barf and quit at some point in time when it can't allocate any more memory (because all memory is in use). Meanwhile, your swap would've filled up completely, and your box would've become totally unresponsive, which goes away instantly the second Perl is dead/quits. Try it. ;-) (at your own risk) -- Heiko Wundram Product Application Development ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
On Thu, Oct 04, 2007 at 04:25:18PM -0400, Steve Bertrand wrote: Heiko Wundram (Beenic) wrote: Am Donnerstag 04 Oktober 2007 22:16:29 schrieb Steve Bertrand: This is what I am afraid of. Just out of curiosity, if I did try to read the entire file into a Perl variable all at once, would the box panic, or as the saying goes 'what could possibly go wrong'? Perl most certainly wouldn't make the box panic (at least I hope so :-)), but would barf and quit at some point in time when it can't allocate any more memory (because all memory is in use). Meanwhile, your swap would've filled up completely, and your box would've become totally unresponsive, which goes away instantly the second Perl is dead/quits. Try it. ;-) (at your own risk) LOL, on a production box?...nope. Hence why I asked here, probing if someone has made this mistake before I do ;) The reason for the massive file size was my haste in running out of the office on Friday and forgetting to kill the tcpdump process before the weekend began. Sounds like you may want a Perl script to automate managing your tcpdumps. Just a thought. -- CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ] Kent Beck: I always knew that one day Smalltalk would replace Java. I just didn't know it would be called Ruby. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
On Thu, Oct 04, 2007 at 04:16:29PM -0400, Steve Bertrand wrote: man 1 split (esp. -l) That's probably the best option for a one-shot deal like this. On the other hand, Perl itself provides the ability to go through a file one line at a time, so you could just read a line, operate, write a line (to a new file) as needed, over and over, until you get through the whole file. The real problem would be reading the whole file into a variable (or even multiple variables) at once. This is what I am afraid of. Just out of curiosity, if I did try to read the entire file into a Perl variable all at once, would the box panic, or as the saying goes 'what could possibly go wrong'? Perl will happily load stuff into RAM until you run out of RAM. I imagine it would then keep loading stuff into memory, and the box would start swapping. Eventually, you'd run out of swap space. Perl is known to some as the Swiss Army chainsaw for a reason: it'll cut limbs off trees about as quickly as you can put limbs in front of it. If you put one of your own limbs in front of it (say, a leg), it'll do exactly the same thing -- but with more bleeding and screaming. It's kinda like Unix, that way. -- CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ] Brian K. Reid: In computer science, we stand on each other's feet. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
On Thursday 04 October 2007 22:16:29 Steve Bertrand wrote: man 1 split (esp. -l) That's probably the best option for a one-shot deal like this. On the other hand, Perl itself provides the ability to go through a file one line at a time, so you could just read a line, operate, write a line (to a new file) as needed, over and over, until you get through the whole file. The real problem would be reading the whole file into a variable (or even multiple variables) at once. This is what I am afraid of. Just out of curiosity, if I did try to read the entire file into a Perl variable all at once, would the box panic, or as the saying goes 'what could possibly go wrong'? There's probably a reason why you want to process that file - splitting it can be a problem if you need to keep track of some states and it splits on the wrong line. So, I'd probably open it in perl (or whatever processor) directly and use a database for storage if I really need to keep string contexts, so that on each line iteration my perl memory is clean. -- Mel ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
Heiko Wundram (Beenic) wrote: Am Donnerstag 04 Oktober 2007 22:16:29 schrieb Steve Bertrand: This is what I am afraid of. Just out of curiosity, if I did try to read the entire file into a Perl variable all at once, would the box panic, or as the saying goes 'what could possibly go wrong'? Perl most certainly wouldn't make the box panic (at least I hope so :-)), but would barf and quit at some point in time when it can't allocate any more memory (because all memory is in use). Meanwhile, your swap would've filled up completely, and your box would've become totally unresponsive, which goes away instantly the second Perl is dead/quits. Try it. ;-) (at your own risk) LOL, on a production box?...nope. Hence why I asked here, probing if someone has made this mistake before I do ;) The reason for the massive file size was my haste in running out of the office on Friday and forgetting to kill the tcpdump process before the weekend began. Steve ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Managing very large files
On Wed, Oct 03, 2007 at 04:51:08PM -0600, Chad Perrin wrote: On Thu, Oct 04, 2007 at 04:25:18PM -0400, Steve Bertrand wrote: Heiko Wundram (Beenic) wrote: Am Donnerstag 04 Oktober 2007 22:16:29 schrieb Steve Bertrand: This is what I am afraid of. Just out of curiosity, if I did try to read the entire file into a Perl variable all at once, would the box panic, or as the saying goes 'what could possibly go wrong'? Perl most certainly wouldn't make the box panic (at least I hope so :-)), but would barf and quit at some point in time when it can't allocate any more memory (because all memory is in use). Meanwhile, your swap would've filled up completely, and your box would've become totally unresponsive, which goes away instantly the second Perl is dead/quits. Try it. ;-) (at your own risk) LOL, on a production box?...nope. Hence why I asked here, probing if someone has made this mistake before I do ;) The reason for the massive file size was my haste in running out of the office on Friday and forgetting to kill the tcpdump process before the weekend began. Sounds like you may want a Perl script to automate managing your tcpdumps. Just a thought. Yes. Actually, you can open that file and start reading it in Perl and open files to write out the chunks the way you want them. Then close each. Make up a name with a counter in it to create all the many files of chunks. Suck off some data/statistics and accumulate info you want as you go. You could even decide some of it isn't worth keeping and cut the size of your chunks down if you don't need all of it. But, you would have to close each of those chunk files or you would run out of space for open files. So, there would have to be a counter loop to keep track of how much was written to each chunk and an open and close for each one. jerry -- CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ] Kent Beck: I always knew that one day Smalltalk would replace Java. I just didn't know it would be called Ruby. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]