[CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Sean Carolan
This snippet of code pulls an array of hostnames from some log files. It has to parse around 3GB of log files, so I'm keen on making it as efficient as possible. Can you think of any way to optimize this to run faster? HOSTS=() for host in $(grep -h -o [-\.0-9a-z][-\.0-9a-z]*.com ${TMPDIR}/* |

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread m . roth
Sean Carolan wrote: This snippet of code pulls an array of hostnames from some log files. It has to parse around 3GB of log files, so I'm keen on making it as efficient as possible. Can you think of any way to optimize this to run faster? HOSTS=() for host in $(grep -h -o

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Gordon Messmer
On 06/28/2012 12:15 PM, Gordon Messmer wrote: You have two major performance problems in this script. First, UTF-8 processing is slow. Second, wildcards are EXTREMELY SLOW! Naturally, you should test both on your own data. I'm amused to admit that I tested my own advice against my mail log

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread m . roth
Sean Carolan wrote: Thank you Mark and Gordon. Since the hostnames I needed to collect are in the same field, at least in the lines of the file that are important. I ended up using suggestions from both of you, the code is like this now. The egrep is there to make sure whatever is in the

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Sean Carolan
*sigh* awk is not cut. What you want is awk '{if (/[-\.0-9a-z][-\.0-9a-z]*.com/) { print $9;}}' | sort -u No grep needed; awk looks for what you want *first* this way. Thanks, Mark. This is cleaner code but it benchmarked slower than awk then grep. real3m35.550s user2m7.186s sys

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Woodchuck
On Thu, Jun 28, 2012 at 01:30:33PM -0500, Sean Carolan wrote: This snippet of code pulls an array of hostnames from some log files. It has to parse around 3GB of log files, so I'm keen on making it as efficient as possible. Can you think of any way to optimize this to run faster? If the key

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread m . roth
Woodchuck wrote: On Thu, Jun 28, 2012 at 01:30:33PM -0500, Sean Carolan wrote: This snippet of code pulls an array of hostnames from some log files. It has to parse around 3GB of log files, so I'm keen on making it as efficient as possible. Can you think of any way to optimize this to run

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Sean Carolan
*sigh* awk is not cut. What you want is awk '{if (/[-\.0-9a-z][-\.0-9a-z]*.com/) { print $9;}}' | sort -u I ended up using this construct in my code; this one fetches out servers that are having issues checking in with puppet: awk '{if (/Could not find default node or by name with/) { print