Re: help with a stat script

2018-07-12 Thread Uri Guttman

On 07/12/2018 11:40 PM, Lauren C. wrote:

Hi Uri,

I was reading this page:
https://www.rexegg.com/regex-lookarounds.html

the content of "Mastering Lookahead and Lookbehind" make me confused.

(?=foo)
(?<=foo)
(?!foo)
(?i suggest you don't study lookarounds until you are stronger with basic 
regex stuff. they are useful but not needed that often. you should start 
with simpler stuff like character classes and their shortcuts, grouping 
and grabbing and quantifiers (repeat counts). then move on to simple 
zero-width assertions and other stuff. after you are very comfortable 
with all that, there are plenty of deeper things to learn like 
lookaround. walk before you run! :)


the site you list above seems like it is well written but its ordering 
of lessons is way too fast and wrong IMO.


i highly recommend you read the official perl tutorial on regexes 
(mentioned by someone else earlier)


https://perldoc.perl.org/perlretut.html

it has the right pace and topic order to learn simpler and more common 
things first and builds on those. the site you found is more like a 
firehose and your asking about lookaround is why it isn't a good tutorial.


uri

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: help with a stat script

2018-07-12 Thread Lauren C.

Hi Uri,

I was reading this page:
https://www.rexegg.com/regex-lookarounds.html

the content of "Mastering Lookahead and Lookbehind" make me confused.

(?=foo)
(?<=foo)
(?!foo)
(?but seriously, regexes are a key feature in perl and most modern 
languages. it is hard to do any text or data processing without them. i 
recommend you read those tutorials mentioned earlier and possibly other 
materials. stay away from most 'perl' or 'regex' tutorials on the net as 
many are very poorly written and full of mistakes.


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: help with a stat script

2018-07-12 Thread Uri Guttman

On 07/12/2018 08:53 PM, Lauren C. wrote:

OK I see, thanks Gil.
I think the main problem is I don't know much about regex.
I will re-learn them this day.

heh, relearning regexes will take a lifetime, not just one day! :)

but seriously, regexes are a key feature in perl and most modern 
languages. it is hard to do any text or data processing without them. i 
recommend you read those tutorials mentioned earlier and possibly other 
materials. stay away from most 'perl' or 'regex' tutorials on the net as 
many are very poorly written and full of mistakes.


and if you need more help with regexes, emailing here is a good thing!

uri

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: help with a stat script

2018-07-12 Thread Lauren C.

Thanks John.

Those symbols made me crazy entirely.
As what you explained, some are metadata of regex, some are regular 
characters, it's not clear to me, due to my poor knowledge on regex.


Yes I will learn them more.

thanks.

On 2018/7/13 星期五 AM 2:23, John W. Krahn wrote:

On Thu, 2018-07-12 at 19:35 +0800, Lauren C. wrote:


My web is powered by Apache and PHP,its access log seems as blow,

xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET
/2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.ne
t/"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"

A perl script for stat purpose of this log:

tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) -
-
\[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2'

I was totally confused about it.
   what does m{...} and its content stand for?




m{^

Start with the (^) beginning of line anchor, the following pattern must
match at the beginning of the line.

(\S+)

Match one or more non-whitespace characters and store the match in the
$1 variable.  This matches the "xx.xx.xx.xx" portion of your string.

' - - \['

Match the literal characters SPACE HYPHEN SPACE HYPHEN SPACE LEFT-
BRACKET.

(\S+)

Match one or more non-whitespace characters and store the match in the
$2 variable.  This matches the "12/Jul/2018:19:29:43" portion of your
string.

'.*\] \"GET'

Match zero or more non-newline characters followed by the literal
string '] "GET '.

(.*?/)

Match as few as possible non-newline characters followed by a '/'
character and store the match in the $3 variable.  This matches the
"/2018/07/06/antique-internet/" portion of your string.

\s+}

And finally, match one or more whitespace characters so that the
previous non-greedy pattern will match correctly.  The modifier is
redundant so it could simply be:

\s}



John



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: help with a stat script

2018-07-12 Thread Lauren C.

OK I see, thanks Gil.
I think the main problem is I don't know much about regex.
I will re-learn them this day.

On 2018/7/12 星期四 PM 10:02, Gil Magno wrote:

2018-07-12 20:50:22 +0800 Lauren C.:

thanks for the kind helps.
do you know what the expression in { } stands for?

^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+


Hi, Lauren

This is quickly explained in 
http://perldoc.perl.org/perlrequick.html#Using-character-classes

\s (lowercase) stands for a "whitespace". \S (uppercase) stands for the 
opposite of \s. So

$name = "lauren";
if ($name =~ m{\s}) { print 'it matched' }

This will not match, because there's no "whitespace" in the string. But this

$name = "lauren";
if ($name =~ m{\S}) { print 'it matched' }

will match, because in the string there is a character which is *not* 
"whitespace".

For the ^ [] and .*? in the regex, those pages I the previous email help you.

Best

gil


On 2018/7/12 星期四 PM 8:37, Илья Рассадин wrote:

"m{ pattern }" is regular expression to parse log string.

It's equal to just "/ pattern /". Using different delimiter is convenient
here because usually symbol "/" must be escaped with backslash "\", but if
we use another delimiter - we can left "/" symbol unescaped and reges is
more readable.

You can further explore regex with this site https://regex101.com/r/4CGCcB/2


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: help with a stat script

2018-07-12 Thread Lauren C.

Thanks Jim. that explains clearly.

On 2018/7/12 星期四 PM 10:00, Jim Gibson wrote:



On Jul 12, 2018, at 5:50 AM, Lauren C.  wrote:

thanks for the kind helps.
do you know what the expression in { } stands for?

^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+


Here is a breakdown:

^   Start looking for matches at beginning of string
(\S+)   Match a consecutive sequence of non-whitespace characters and save in 
the $1 variable
—   Match the literal string ‘ — ‘
\[  Match the character ‘[‘
(\S+)   Match a consecutive sequence of non-whitespace characters and save in 
the $2 variable
.*  Match any consecutive zero or more characters
\]  Match the character ‘]’
(space) Match a space character
\”  Match the character ‘“‘
GET Match the literal string ‘GET ‘ (with a space at the end)
(.*?/)  Match the shortest string of any consecutive characters up to but not 
including a following whitespace and save in $3
\s+ Match any consecutive sequence of whitespace characters

If all of the above entities are matched, then the regular expression 
evaluation returns true and the 41, $2, and $3 variables are assigned to their 
captured matches.



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: help with a stat script

2018-07-12 Thread John W. Krahn
On Thu, 2018-07-12 at 19:35 +0800, Lauren C. wrote:
> 
> My web is powered by Apache and PHP,its access log seems as blow,
> 
> xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET 
> /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.ne
> t/" 
> "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 
> (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
> 
> A perl script for stat purpose of this log:
> 
> tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) -
> - 
> \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2'
> 
> I was totally confused about it.
>   what does m{...} and its content stand for?



m{^

Start with the (^) beginning of line anchor, the following pattern must
match at the beginning of the line.

(\S+)

Match one or more non-whitespace characters and store the match in the
$1 variable.  This matches the "xx.xx.xx.xx" portion of your string.

' - - \['

Match the literal characters SPACE HYPHEN SPACE HYPHEN SPACE LEFT-
BRACKET.

(\S+)

Match one or more non-whitespace characters and store the match in the
$2 variable.  This matches the "12/Jul/2018:19:29:43" portion of your
string.

'.*\] \"GET '

Match zero or more non-newline characters followed by the literal
string '] "GET '.

(.*?/)

Match as few as possible non-newline characters followed by a '/'
character and store the match in the $3 variable.  This matches the
"/2018/07/06/antique-internet/" portion of your string.

\s+}

And finally, match one or more whitespace characters so that the
previous non-greedy pattern will match correctly.  The modifier is
redundant so it could simply be:

\s}



John

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: help with a stat script

2018-07-12 Thread Jim Gibson


> On Jul 12, 2018, at 5:50 AM, Lauren C.  wrote:
> 
> thanks for the kind helps.
> do you know what the expression in { } stands for?
> 
> ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+

Here is a breakdown:

^   Start looking for matches at beginning of string
(\S+)   Match a consecutive sequence of non-whitespace characters and save in 
the $1 variable
—   Match the literal string ‘ — ‘
\[  Match the character ‘[‘
(\S+)   Match a consecutive sequence of non-whitespace characters and save in 
the $2 variable
.*  Match any consecutive zero or more characters
\]  Match the character ‘]’
(space) Match a space character
\”  Match the character ‘“‘
GET Match the literal string ‘GET ‘ (with a space at the end)
(.*?/)  Match the shortest string of any consecutive characters up to but not 
including a following whitespace and save in $3
\s+ Match any consecutive sequence of whitespace characters

If all of the above entities are matched, then the regular expression 
evaluation returns true and the 41, $2, and $3 variables are assigned to their 
captured matches.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: help with a stat script

2018-07-12 Thread Gil Magno
2018-07-12 20:50:22 +0800 Lauren C.:
> thanks for the kind helps.
> do you know what the expression in { } stands for?
> 
> ^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+

Hi, Lauren

This is quickly explained in 
http://perldoc.perl.org/perlrequick.html#Using-character-classes

\s (lowercase) stands for a "whitespace". \S (uppercase) stands for the 
opposite of \s. So

$name = "lauren";
if ($name =~ m{\s}) { print 'it matched' }

This will not match, because there's no "whitespace" in the string. But this

$name = "lauren";
if ($name =~ m{\S}) { print 'it matched' }

will match, because in the string there is a character which is *not* 
"whitespace".

For the ^ [] and .*? in the regex, those pages I the previous email help you.

Best

gil

> On 2018/7/12 星期四 PM 8:37, Илья Рассадин wrote:
> >"m{ pattern }" is regular expression to parse log string.
> >
> >It's equal to just "/ pattern /". Using different delimiter is convenient
> >here because usually symbol "/" must be escaped with backslash "\", but if
> >we use another delimiter - we can left "/" symbol unescaped and reges is
> >more readable.
> >
> >You can further explore regex with this site https://regex101.com/r/4CGCcB/2
> 
> -- 
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
> 
> 


signature.asc
Description: Digital signature


Re: help with a stat script

2018-07-12 Thread Lauren C.

thanks for the kind helps.
do you know what the expression in { } stands for?

^(\S+) - - \[(\S+).*\] \"GET (.*?/)\s+



On 2018/7/12 星期四 PM 8:37, Илья Рассадин wrote:

"m{ pattern }" is regular expression to parse log string.

It's equal to just "/ pattern /". Using different delimiter is 
convenient here because usually symbol "/" must be escaped with 
backslash "\", but if we use another delimiter - we can left "/" symbol 
unescaped and reges is more readable.


You can further explore regex with this site https://regex101.com/r/4CGCcB/2


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: help with a stat script

2018-07-12 Thread Lauren C.

thanks Magno. i will check it.

On 2018/7/12 星期四 PM 8:13, Gil Magno wrote:

Hi, Lauren

The m{...} is a regular expression (regexp). If you not familiar with
regexps in Perl, I advise you to read these pages:

-http://perldoc.perl.org/perlintro.html#Regular-expressions
-http://perldoc.perl.org/perlrequick.html


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: help with a stat script

2018-07-12 Thread Gil Magno
2018-07-12 19:35:14 +0800 Lauren C.:
> Hello,
> 
> My web is powered by Apache and PHP,its access log seems as blow,
> 
> xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET
> /2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.net/";
> "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML,
> like Gecko) Chrome/67.0.3396.99 Safari/537.36"
> 
> A perl script for stat purpose of this log:
> 
> tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - -
> \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2'
> 
> I was totally confused about it.
>  what does m{...} and its content stand for?
> Can you help give a explain?

Hi, Lauren

The m{...} is a regular expression (regexp). If you not familiar with
regexps in Perl, I advise you to read these pages:

- http://perldoc.perl.org/perlintro.html#Regular-expressions
- http://perldoc.perl.org/perlrequick.html

> thanks in advance.
> 
> -- 
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
> 
> 


signature.asc
Description: Digital signature


Re: help with a stat script

2018-07-12 Thread Илья Рассадин

Hi!

"m{ pattern }" is regular expression to parse log string.

It's equal to just "/ pattern /". Using different delimiter is 
convenient here because usually symbol "/" must be escaped with 
backslash "\", but if we use another delimiter - we can left "/" symbol 
unescaped and reges is more readable.


You can further explore regex with this site https://regex101.com/r/4CGCcB/2


On 7/12/18 2:35 PM, Lauren C. wrote:

Hello,

My web is powered by Apache and PHP,its access log seems as blow,

xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET 
/2018/07/06/antique-internet/ HTTP/1.1" 200 5489 
"https://miscnote.net/"; "Mozilla/5.0 (Macintosh; Intel Mac OS X 
10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 
Safari/537.36"


A perl script for stat purpose of this log:

tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - 
- \[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2'


I was totally confused about it.
 what does m{...} and its content stand for?
Can you help give a explain?

thanks in advance.



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




help with a stat script

2018-07-12 Thread Lauren C.

Hello,

My web is powered by Apache and PHP,its access log seems as blow,

xx.xx.xx.xx - - [12/Jul/2018:19:29:43 +0800] "GET 
/2018/07/06/antique-internet/ HTTP/1.1" 200 5489 "https://miscnote.net/"; 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"


A perl script for stat purpose of this log:

tail -f /var/log/apache2/access.log|perl -nle 'next unless m{^(\S+) - - 
\[(\S+).*\] \"GET (.*?/)\s+}; printf "%-20s%-40s%-40s\n",$1,$3,$2'


I was totally confused about it.
 what does m{...} and its content stand for?
Can you help give a explain?

thanks in advance.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/