Re: help with a stat script
On 07/12/2018 11:40 PM, Lauren C. wrote: Hi Uri, I was reading this page: https://www.rexegg.com/regex-lookarounds.html the content of "Mastering Lookahead and Lookbehind" make me confused. (?=foo) (?<=foo) (?!foo) (?i suggest you don't study lookarounds until you are stronger with basic regex stuff. they are useful but not needed that often. you should start with simpler stuff like character classes and their shortcuts, grouping and grabbing and quantifiers (repeat counts). then move on to simple zero-width assertions and other stuff. after you are very comfortable with all that, there are plenty of deeper things to learn like lookaround. walk before you run! :) the site you list above seems like it is well written but its ordering of lessons is way too fast and wrong IMO. i highly recommend you read the official perl tutorial on regexes (mentioned by someone else earlier) https://perldoc.perl.org/perlretut.html it has the right pace and topic order to learn simpler and more common things first and builds on those. the site you found is more like a firehose and your asking about lookaround is why it isn't a good tutorial. uri -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: help with a stat script
Hi Uri, I was reading this page: https://www.rexegg.com/regex-lookarounds.html the content of "Mastering Lookahead and Lookbehind" make me confused. (?=foo) (?<=foo) (?!foo) (?but seriously, regexes are a key feature in perl and most modern languages. it is hard to do any text or data processing without them. i recommend you read those tutorials mentioned earlier and possibly other materials. stay away from most 'perl' or 'regex' tutorials on the net as many are very poorly written and full of mistakes. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: help with a stat script
On 07/12/2018 08:53 PM, Lauren C. wrote: OK I see, thanks Gil. I think the main problem is I don't know much about regex. I will re-learn them this day. heh, relearning regexes will take a lifetime, not just one day! :) but seriously, regexes are a key feature in perl and most modern languages. it is hard to do any text or data processing without them. i recommend you read those tutorials mentioned earlier and possibly other materials. stay away from most 'perl' or 'regex' tutorials on the net as many are very poorly written and full of mistakes. and if you need more help with regexes, emailing here is a good thing! uri -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: help with a stat script
Thanks John. Those symbols made me crazy entirely. As what you explained, some are metadata of regex, some are regular characters, it's not clear to me, due to my poor knowledge on regex. Yes I will learn them more. thanks. On 2018/7/13 星期五 AM 2:23, John W. Krahn wrote: On Thu, 2018-07-12 at 19:35 +0800, Lauren C. wrote: My web is powered by Apache and PHP,its access log seems as blow, xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.ne t/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36" A perl script for stat purpose of this log: tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2' I was totally confused about it. what does m{...} and its content stand for? m{^ Start with the (^) beginning of line anchor, the following pattern must match at the beginning of the line. (\S+) Match one or more non-whitespace characters and store the match in the $1 variable. This matches the "xx.xx.xx.xx" portion of your string. ' - - \[' Match the literal characters SPACE HYPHEN SPACE HYPHEN SPACE LEFT- BRACKET. (\S+) Match one or more non-whitespace characters and store the match in the $2 variable. This matches the "12/Jul/2018:19:29:43" portion of your string. '.*\] \"GET' Match zero or more non-newline characters followed by the literal string '] "GET '. (.*?/) Match as few as possible non-newline characters followed by a '/' character and store the match in the $3 variable. This matches the "/2018/07/06/antique-internet/" portion of your string. \s+} And finally, match one or more whitespace characters so that the previous non-greedy pattern will match correctly. The modifier is redundant so it could simply be: \s} John -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: help with a stat script
OK I see, thanks Gil. I think the main problem is I don't know much about regex. I will re-learn them this day. On 2018/7/12 星期四 PM 10:02, Gil Magno wrote: 2018-07-12 20:50:22 +0800 Lauren C.: thanks for the kind helps. do you know what the expression in { } stands for? ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+ Hi, Lauren This is quickly explained in http://perldoc.perl.org/perlrequick.html#Using-character-classes \s (lowercase) stands for a "whitespace". \S (uppercase) stands for the opposite of \s. So $name = "lauren"; if ($name =~ m{\s}) { print 'it matched' } This will not match, because there's no "whitespace" in the string. But this $name = "lauren"; if ($name =~ m{\S}) { print 'it matched' } will match, because in the string there is a character which is *not* "whitespace". For the ^ [] and .*? in the regex, those pages I the previous email help you. Best gil On 2018/7/12 星期四 PM 8:37, Илья Рассадин wrote: "m{ pattern }" is regular expression to parse log string. It's equal to just "/ pattern /". Using different delimiter is convenient here because usually symbol "/" must be escaped with backslash "\", but if we use another delimiter - we can left "/" symbol unescaped and reges is more readable. You can further explore regex with this site https://regex101.com/r/4CGCcB/2 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: help with a stat script
Thanks Jim. that explains clearly. On 2018/7/12 星期四 PM 10:00, Jim Gibson wrote: On Jul 12, 2018, at 5:50 AM, Lauren C. wrote: thanks for the kind helps. do you know what the expression in { } stands for? ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+ Here is a breakdown: ^ Start looking for matches at beginning of string (\S+) Match a consecutive sequence of non-whitespace characters and save in the $1 variable — Match the literal string ‘ — ‘ \[ Match the character ‘[‘ (\S+) Match a consecutive sequence of non-whitespace characters and save in the $2 variable .* Match any consecutive zero or more characters \] Match the character ‘]’ (space) Match a space character \” Match the character ‘“‘ GET Match the literal string ‘GET ‘ (with a space at the end) (.*?/) Match the shortest string of any consecutive characters up to but not including a following whitespace and save in $3 \s+ Match any consecutive sequence of whitespace characters If all of the above entities are matched, then the regular expression evaluation returns true and the 41, $2, and $3 variables are assigned to their captured matches. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: help with a stat script
On Thu, 2018-07-12 at 19:35 +0800, Lauren C. wrote: > > My web is powered by Apache and PHP,its access log seems as blow, > > xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET > /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.ne > t/" > "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 > (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36" > > A perl script for stat purpose of this log: > > tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - > - > \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2' > > I was totally confused about it. > what does m{...} and its content stand for? m{^ Start with the (^) beginning of line anchor, the following pattern must match at the beginning of the line. (\S+) Match one or more non-whitespace characters and store the match in the $1 variable. This matches the "xx.xx.xx.xx" portion of your string. ' - - \[' Match the literal characters SPACE HYPHEN SPACE HYPHEN SPACE LEFT- BRACKET. (\S+) Match one or more non-whitespace characters and store the match in the $2 variable. This matches the "12/Jul/2018:19:29:43" portion of your string. '.*\] \"GET ' Match zero or more non-newline characters followed by the literal string '] "GET '. (.*?/) Match as few as possible non-newline characters followed by a '/' character and store the match in the $3 variable. This matches the "/2018/07/06/antique-internet/" portion of your string. \s+} And finally, match one or more whitespace characters so that the previous non-greedy pattern will match correctly. The modifier is redundant so it could simply be: \s} John -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: help with a stat script
> On Jul 12, 2018, at 5:50 AM, Lauren C. wrote: > > thanks for the kind helps. > do you know what the expression in { } stands for? > > ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+ Here is a breakdown: ^ Start looking for matches at beginning of string (\S+) Match a consecutive sequence of non-whitespace characters and save in the $1 variable — Match the literal string ‘ — ‘ \[ Match the character ‘[‘ (\S+) Match a consecutive sequence of non-whitespace characters and save in the $2 variable .* Match any consecutive zero or more characters \] Match the character ‘]’ (space) Match a space character \” Match the character ‘“‘ GET Match the literal string ‘GET ‘ (with a space at the end) (.*?/) Match the shortest string of any consecutive characters up to but not including a following whitespace and save in $3 \s+ Match any consecutive sequence of whitespace characters If all of the above entities are matched, then the regular expression evaluation returns true and the 41, $2, and $3 variables are assigned to their captured matches. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: help with a stat script
2018-07-12 20:50:22 +0800 Lauren C.: > thanks for the kind helps. > do you know what the expression in { } stands for? > > ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+ Hi, Lauren This is quickly explained in http://perldoc.perl.org/perlrequick.html#Using-character-classes \s (lowercase) stands for a "whitespace". \S (uppercase) stands for the opposite of \s. So $name = "lauren"; if ($name =~ m{\s}) { print 'it matched' } This will not match, because there's no "whitespace" in the string. But this $name = "lauren"; if ($name =~ m{\S}) { print 'it matched' } will match, because in the string there is a character which is *not* "whitespace". For the ^ [] and .*? in the regex, those pages I the previous email help you. Best gil > On 2018/7/12 星期四 PM 8:37, Илья Рассадин wrote: > >"m{ pattern }" is regular expression to parse log string. > > > >It's equal to just "/ pattern /". Using different delimiter is convenient > >here because usually symbol "/" must be escaped with backslash "\", but if > >we use another delimiter - we can left "/" symbol unescaped and reges is > >more readable. > > > >You can further explore regex with this site https://regex101.com/r/4CGCcB/2 > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > signature.asc Description: Digital signature
Re: help with a stat script
thanks for the kind helps. do you know what the expression in { } stands for? ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+ On 2018/7/12 星期四 PM 8:37, Илья Рассадин wrote: "m{ pattern }" is regular expression to parse log string. It's equal to just "/ pattern /". Using different delimiter is convenient here because usually symbol "/" must be escaped with backslash "\", but if we use another delimiter - we can left "/" symbol unescaped and reges is more readable. You can further explore regex with this site https://regex101.com/r/4CGCcB/2 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: help with a stat script
thanks Magno. i will check it. On 2018/7/12 星期四 PM 8:13, Gil Magno wrote: Hi, Lauren The m{...} is a regular expression (regexp). If you not familiar with regexps in Perl, I advise you to read these pages: -http://perldoc.perl.org/perlintro.html#Regular-expressions -http://perldoc.perl.org/perlrequick.html -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: help with a stat script
2018-07-12 19:35:14 +0800 Lauren C.: > Hello, > > My web is powered by Apache and PHP,its access log seems as blow, > > xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET > /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.net/"; > "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, > like Gecko) Chrome/67.0.3396.99 Safari/537.36" > > A perl script for stat purpose of this log: > > tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - - > \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2' > > I was totally confused about it. > what does m{...} and its content stand for? > Can you help give a explain? Hi, Lauren The m{...} is a regular expression (regexp). If you not familiar with regexps in Perl, I advise you to read these pages: - http://perldoc.perl.org/perlintro.html#Regular-expressions - http://perldoc.perl.org/perlrequick.html > thanks in advance. > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > signature.asc Description: Digital signature
Re: help with a stat script
Hi! "m{ pattern }" is regular expression to parse log string. It's equal to just "/ pattern /". Using different delimiter is convenient here because usually symbol "/" must be escaped with backslash "\", but if we use another delimiter - we can left "/" symbol unescaped and reges is more readable. You can further explore regex with this site https://regex101.com/r/4CGCcB/2 On 7/12/18 2:35 PM, Lauren C. wrote: Hello, My web is powered by Apache and PHP,its access log seems as blow, xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.net/"; "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36" A perl script for stat purpose of this log: tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2' I was totally confused about it. what does m{...} and its content stand for? Can you help give a explain? thanks in advance. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
help with a stat script
Hello, My web is powered by Apache and PHP,its access log seems as blow, xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.net/"; "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36" A perl script for stat purpose of this log: tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2' I was totally confused about it. what does m{...} and its content stand for? Can you help give a explain? thanks in advance. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/