Re: Regular Expression Trouble
Paul Chvostek wrote: This is an attempt to isolate every MAC address that appears and then sort and count them to see who is having trouble or, in some cases, is causing trouble. Then you still may want to use awk for some of that... cat /var/log/dhcpd.log | \ sed -nE 's/.*([0-9a-f]{2}(:[0-9a-f]{2}){5}).*/\1/p' | \ awk ' { a[$1]++; } END { for(i in a){ printf(%7.0f\t%s\n, a[i], i); } } ' | sort -nr Euhmm: sed -nE 's/.*([0-9a-f]{2}(:[0-9a-f]{2}){5}).*/\1/p' /var/log/dhcpd.log|\ sort | uniq -c :-) Peter -- http://www.boosten.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
On Tue, 2008-08-26 at 22:12 -0500, Martin McCormick wrote: I am trying to isolate only the MAC addresses that appear in dhcpd logs. For anyone who is interested, the sed construct that should do this looks like: sed 's/.*\([[ your regular expression ]]\).*/\1/' The \1 tells sed to only print what matched and skip all the rest. I am doing something wrong with the regular expression that is supposed to recognise a MAC address. MAC addresses look like 5 pairs of hex digits followed by :'s and then a 6TH pair to end the string. I have tried: [[:xdigit:][:xdigit:][:punct:] Sorry. It won't all fit on a line, but there should be a string of 5 pairs and the : and then the 6TH pair followed by the closing ] so the expression ends with ]] One should also be able to put: [[:xdigit:][:xdigit:][:punct:]]\{5,5\}[[:xdigit:][:xdigit]] Any ideas as to what else I can try? There have already been good suggestions for you to choose from. I'll just add my bucketful to the TIMTOWTDI pool: Since you weren't specific about the format of the log data that you're attempting to parse (keep that in mind for future questions): # ifconfig | grep ether ether 02:00:20:75:43:34 ether 00:40:05:10:b9:79 # ifconfig | sed -nE 's/.*ether (([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}).*/\1/p' 02:00:20:75:43:34 00:40:05:10:b9:79 # ifconfig | sed -nE 's/.*ether ([[:xdigit:]:]+).*/\1/p' 02:00:20:75:43:34 00:40:05:10:b9:79 # ifconfig | sed -nE 's/.*ether ([0-9a-f:]+).*/\1/p' 02:00:20:75:43:34 00:40:05:10:b9:79 # ifconfig | sed -nE '/ether/s/.*([0-9a-f:]{17}).*/\1/p' 02:00:20:75:43:34 00:40:05:10:b9:79 And then there's: # ifconfig | grep ether | cut -d -f 2 02:00:20:75:43:34 00:40:05:10:b9:79 But my preference would be: # ifconfig | awk '/ether/ {print $2}' 02:00:20:75:43:34 00:40:05:10:b9:79 Wayne ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
My thanks to several people who have provided great suggestions and an apology for not being clear on the log data I am mining for MAC addresses. It is syslog and a typical line looks like: Aug 26 20:45:36 dh1 dhcpd: DHCPACK on 10.198.67.116 to 00:12:f0:88:97:d6 (peaster-laptop) via 10.198.71.246 That was one line broken to aid in emailing, but that's what types of lines are involved. The MAC appears at different field locations depending on the type of event being logged so awk is perfect for certain types of lines, but it misses others and no one awk expression gets them all. This is an attempt to isolate every MAC address that appears and then sort and count them to see who is having trouble or, in some cases, is causing trouble. The sed pattern matching system is interesting because I can think of several similar situations in which the data are there but there is no guarantee where on a given line it sits and grep or sed usually will pull in the whole line containing the desired data which means that one must further parse things to get what is wanted. Martin McCormick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
On Wednesday 27 August 2008 15:25:02 Martin McCormick wrote: The sed pattern matching system is interesting because I can think of several similar situations in which the data are there but there is no guarantee where on a given line it sits and grep or sed usually will pull in the whole line containing the desired data which means that one must further parse things to get what is wanted. Hi Martin Look at grep -o which only outputs the bit that matched the regexp. Using egrep, you can look for exactly two hex digits and a colon, repeated exactly five times, and followed by exactly two hex digits: egrep -o '([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}' inputfile will parse inputfile and output all the MAC addresses it finds, one per line (if it finds more than one on an input line, it'll match them and print them on separate output lines), and nothing else. Jonathan ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
On Wed, 2008-08-27 at 08:25 -0500, Martin McCormick wrote: My thanks to several people who have provided great suggestions and an apology for not being clear on the log data I am mining for MAC addresses. It is syslog and a typical line looks like: Aug 26 20:45:36 dh1 dhcpd: DHCPACK on 10.198.67.116 to 00:12:f0:88:97:d6 (peaster-laptop) via 10.198.71.246 That was one line broken to aid in emailing, but that's what types of lines are involved. The MAC appears at different field locations depending on the type of event being logged so awk is perfect for certain types of lines, but it misses others and no one awk expression gets them all. The way to deal with that is to specify a pattern to match something that distinguishes each form of log line that you want to extract from. With the following (contrived) log data: Aug 26 20:45:36 dh1 dhcpd: DHCPDISCOVER from 00:12:f0:88:97:d6 (peaster-laptop) via eth0 Aug 26 20:45:36 dh1 dhcpd: DHCPACK on 10.198.67.116 to 00:12:f0:88:97:d6 (peaster-laptop) via 10.198.71.246 use awk with a script such as: awk '/DHCPDISCOVER/ {print $8} /DHCPACK/ {print $10}' logfile Wayne ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
Hi Martin. On Wed, Aug 27, 2008 at 08:25:02AM -0500, Martin McCormick wrote: Aug 26 20:45:36 dh1 dhcpd: DHCPACK on 10.198.67.116 to 00:12:f0:88:97:d6 (peaster-laptop) via 10.198.71.246 That was one line broken to aid in emailing, but that's what types of lines are involved. The MAC appears at different field locations depending on the type of event being logged so awk is perfect for certain types of lines, but it misses others and no one awk expression gets them all. While I agree with others that awk should be used with explicit recognition of the particular lines, you can still snatch everything with sed if you want to. In FreeBSD, sed supported extended regex, so: sed -nE 's/.*([0-9a-f]{2}(:[0-9a-f]{2}){5}).*/\1/p' The -n option tells sed not to print the line unless instructed to explicitely, and the p modifier at the end is that instruction. As for the regex ... well, that's straightforward enough. This is an attempt to isolate every MAC address that appears and then sort and count them to see who is having trouble or, in some cases, is causing trouble. Then you still may want to use awk for some of that... cat /var/log/dhcpd.log | \ sed -nE 's/.*([0-9a-f]{2}(:[0-9a-f]{2}){5}).*/\1/p' | \ awk ' { a[$1]++; } END { for(i in a){ printf(%7.0f\t%s\n, a[i], i); } } ' | sort -nr You can join the lines into a single command line if you like, or toss it as-is into a tiny shell script. Awk is forgiving about whitespace. You should theoretically be able to feed the same regex to awk, but I've found that awk's eregex support sometimes doesn't work as I'd expect. Hope this helps. p -- Paul Chvostek [EMAIL PROTECTED] it.canadahttp://www.it.ca/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
Paul Chvostek writes: While I agree with others that awk should be used with explicit recognition of the particular lines, you can still snatch everything with sed if you want to. In FreeBSD, sed supported extended regex, so: sed -nE 's/.*([0-9a-f]{2}(:[0-9a-f]{2}){5}).*/\1/p' The -n option tells sed not to print the line unless instructed to explicitely, and the p modifier at the end is that instruction. As for the regex ... well, that's straightforward enough. This is an attempt to isolate every MAC address that appears and then sort and count them to see who is having trouble or, in some cases, is causing trouble. Then you still may want to use awk for some of that... Actually, I have been using awk, but maybe not as efficiently as I could be, judging from the responses. I was hoping that the sed script would recognize 6 pairs of hex digits connected by :'s no matter where they appeared in a line and give me just that pattern match as, in this case, I don't care why the MAC address printed, only that it did and having nothing but MAC's makes the rest of the sorting and counting trivial. Other helpful examples not quoted but much appreciated. . . Hope this helps. It helps a lot. Awk is one of those things that one can use for years and still not exploit all the good things it has. I am amazed even after years of using UNIX how much genius is packed in to the basic system. One last sed observation. I did fail to use the -E flag so sed didn't know it should be using extended RE's. I will give your examples a try for both sed and awk and see what new capabilities I can come up with. Again, a thousand thanks to you and everyone else for your answers and patience. It is good to see many different ways of solving the same problem. Martin McCormick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Regular Expression Trouble
I am trying to isolate only the MAC addresses that appear in dhcpd logs. For anyone who is interested, the sed construct that should do this looks like: sed 's/.*\([[ your regular expression ]]\).*/\1/' The \1 tells sed to only print what matched and skip all the rest. I am doing something wrong with the regular expression that is supposed to recognise a MAC address. MAC addresses look like 5 pairs of hex digits followed by :'s and then a 6TH pair to end the string. I have tried: [[:xdigit:][:xdigit:][:punct:] Sorry. It won't all fit on a line, but there should be a string of 5 pairs and the : and then the 6TH pair followed by the closing ] so the expression ends with ]] One should also be able to put: [[:xdigit:][:xdigit:][:punct:]]\{5,5\}[[:xdigit:][:xdigit]] Any ideas as to what else I can try? What happens is I get single characters per line that look like the first or maybe the last character in that line, but certainly nothing useful or nothing that remotely looks like a MAC address. Any ideas as to what's wrong with the regular expression? Many thanks. Martin McCormick WB5AGZ Stillwater, OK Systems Engineer OSU Information Technology Department Telecommunications Services Group ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
Martin McCormick wrote: I am trying to isolate only the MAC addresses that appear in dhcpd logs. For anyone who is interested, the sed construct that should do this looks like: sed 's/.*\([[ your regular expression ]]\).*/\1/' The \1 tells sed to only print what matched and skip all the rest. I am doing something wrong with the regular expression that is supposed to recognise a MAC address. MAC addresses look like 5 pairs of hex digits followed by :'s and then a 6TH pair to end the string. I have tried: [[:xdigit:][:xdigit:][:punct:] Sorry. It won't all fit on a line, but there should be a string of 5 pairs and the : and then the 6TH pair followed by the closing ] so the expression ends with ]] One should also be able to put: [[:xdigit:][:xdigit:][:punct:]]\{5,5\}[[:xdigit:][:xdigit]] Any ideas as to what else I can try? What happens is I get single characters per line that look like the first or maybe the last character in that line, but certainly nothing useful or nothing that remotely looks like a MAC address. Any ideas as to what's wrong with the regular expression? Many thanks. Martin McCormick WB5AGZ Stillwater, OK Systems Engineer OSU Information Technology Department Telecommunications Services Group ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] I don't have a seperate dhcp log and you didn't make it clear if you do, but I do have something similar written for awk that parses the system log file. awk ' /DHCPREQUEST/ { print $10 } ' /var/log/messages Maybe that will help. ~Paul ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
On Tue, Aug 26, 2008, Martin McCormick wrote: I am trying to isolate only the MAC addresses that appear in dhcpd logs. For anyone who is interested, the sed construct that should do this looks like: sed 's/.*\([[ your regular expression ]]\).*/\1/' The \1 tells sed to only print what matched and skip all the rest. I just tried this, and it worked: sed -n 's;.* to \([0-9:a-z]*\) via.*;\1;p' logfile It would have been easier in perl or python where one could use the pattern '.* to (\S+) via.*'. Bill -- INTERNET: [EMAIL PROTECTED] Bill Campbell; Celestial Software LLC URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way Voice: (206) 236-1676 Mercer Island, WA 98040-0820 Fax:(206) 232-9186 My reading of history convinces me that most bad government results from too much government. --Thomas Jefferson. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
At 2008-08-26T22:12:19-05:00, Martin McCormick wrote: I am trying to isolate only the MAC addresses that appear in dhcpd logs. For anyone who is interested, the sed construct that should do this looks like: It'd be better if you post a few relevant lines of the log file. Pending that, I suggest awk(1) in case the log file format is similar to the following snippet of `dhcpd.leases' on an OpenBSD server: % cat dhcpd.leases lease 192.168.10.216 { starts 3 2008/07/16 23:17:29; ends 4 2008/07/17 00:17:29; tstp 4 2008/07/17 00:17:29; binding state free; hardware ethernet 00:1f:c6:81:66:a6; } lease 192.168.10.65 { starts 4 2008/07/17 11:15:48; ends 5 2008/07/18 11:15:48; tstp 5 2008/07/18 11:15:48; binding state free; hardware ethernet 00:16:d3:9e:eb:74; } % awk '/hardware ethernet/ { print substr($3, 4, 14) }' dhcpd.leases 1f:c6:81:66:a6 16:d3:9e:eb:74 Raghavendra. -- N. Raghavendra [EMAIL PROTECTED] | http://www.retrotexts.net/ Harish-Chandra Research Institute | http://www.mri.ernet.in/ See message headers for contact and OpenPGP information. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
Parv writes: For even finer results, use word boundaries ... egrep '\bIN[^[:alnum:]]+A\b' file egrep '\IN[[:space:]]+A\' file I sincerely thank all for your examples which I have saved for future reference. The word boundary test appears to work perfectly, but after looking at all the other examples, they should work also giving living proof that in UNIX, there are many perfectly valid ways to solve the same problem. Again, many thanks. Martin McCormick WB5AGZ Stillwater, OK OSU Information Technology Department Network Operations Group .-- -... . .- --. --.. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Regular Expression Trouble
After reading a bit about extended regular expressions and having a few actually work correctly in sed scripts, I tried one in egrep and it isn't working although there are no errors. I was hoping to get only the A records from a dns zone file so the expression I used is: egrep [[:space:]IN[:space:]A[:space:]] zone_file h0 Had it worked, all that would have appeared in the h0 file was all the A or Address records. Instead, I get those plus almost everything else in the file. It is obviously not filtering correctly. Plain grep produces a 0-length output file so that is not too useful either. Putting double quotes around the RE didn't help either. It seems to match almost everything. Thanks for any good ideas. Martin McCormick WB5AGZ Stillwater, OK OSU Information Technology Department Network Operations Group .-- -... . .- --. --.. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
On 12/10/05, Martin McCormick [EMAIL PROTECTED] wrote: After reading a bit about extended regular expressions and having a few actually work correctly in sed scripts, I tried one in egrep and it isn't working although there are no errors. I was hoping to get only the A records from a dns zone file so the expression I used is: egrep [[:space:]IN[:space:]A[:space:]] zone_file h0 Had it worked, all that would have appeared in the h0 file was all the A or Address records. Instead, I get those plus almost everything else in the file. It is obviously not filtering correctly. Plain grep produces a 0-length output file so that is not too useful either. Putting double quotes around the RE didn't help either. It seems to match almost everything. Thanks for any good ideas. Martin McCormick WB5AGZ Stillwater, OK OSU Information Technology Department Network Operations Group .-- -... . .- --. --.. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] Isn't it a character class consisting of a white-space and three letters? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
At 02:12 PM 12/9/2005, Martin McCormick wrote: After reading a bit about extended regular expressions and having a few actually work correctly in sed scripts, I tried one in egrep and it isn't working although there are no errors. I was hoping to get only the A records from a dns zone file so the expression I used is: egrep [[:space:]IN[:space:]A[:space:]] zone_file h0 Had it worked, all that would have appeared in the h0 file was all the A or Address records. Instead, I get those plus almost everything else in the file. It is obviously not filtering correctly. Plain grep produces a 0-length output file so that is not too useful either. Putting double quotes around the RE didn't help either. It seems to match almost everything. Thanks for any good ideas. try: egrep IN[^[:alnum:]]+A zone_file -Glenn Martin McCormick WB5AGZ Stillwater, OK OSU Information Technology Department Network Operations Group .-- -... . .- --. --.. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
in message [EMAIL PROTECTED], wrote Glenn Dawson thusly... At 02:12 PM 12/9/2005, Martin McCormick wrote: I was hoping to get only the A records from a dns zone file so the expression I used is: egrep [[:space:]IN[:space:]A[:space:]] zone_file h0 Had it worked, all that would have appeared in the h0 file was all the A or Address records. Instead, I get those plus almost everything else in the file. try: egrep IN[^[:alnum:]]+A zone_file For even finer results, use word boundaries ... egrep '\bIN[^[:alnum:]]+A\b' file egrep '\IN[[:space:]]+A\' file - Parv -- ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Regular Expression Trouble
Martin McCormick wrote: After reading a bit about extended regular expressions and having a few actually work correctly in sed scripts, I tried one in egrep and it isn't working although there are no errors. I was hoping to get only the A records from a dns zone file so the expression I used is: egrep [[:space:]IN[:space:]A[:space:]] zone_file h0 If you're using vi, put your cursor on that very first '[' and bounce on the % key for a while; see if anything occurs to you. If not, you could probably use a refresher course on that little sub-syntax of regular expressions called character classes. It seems to match almost everything. The regex [[:space:]IN[:space:]A[:space:]] is composed entirely of one large character class. Classes are logical sets and have no interest in the order or quantity of their contents, so it's reduced to [[:space:]AIN]. When fed to egrep, it says, match any line which contains any of these 4 entities. Most lines will contain a space character, if not the 3 letters, so your output makes sense. As the [:space:] only has meaning inside brackets itself, it's improtant to open enclose each individual occurence inside it's own additional set of brackets. egrep [[:space:]]IN[[:space:]]A[[:space:]] zone_file h0 HTH, Tim Hammerquist ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]