Re: Riddle me this: grep / regx experts

2018-02-03 Thread R. G. Newbury

 Subject: Re: Riddle me this: grep / regx experts

Allegedly, on or about 2 February 2018, R. G. Newbury sent:

I am cleaning up some html code, using sed to standardize the
formatting. I was searching for specific instances of code to amend
using grep.


In case you're not aware of it, there's a HTML tidy command that
neatens up HTML.


I am using sed basically for search and replace.
I already use tidy, but it does not deal with my problem which is that 
the text has multiple variations in the *text* formatting in the many 
different files. This screws up the parsing and requires normalization.


Tidy does not touch that. Unfortunately.

Geoff

 R. Geoffrey Newbury
 954 Owenwood Drive
 Mississauga, Ontario, L5H 3J2

  t905-271-9600 newb...@mandamus.org
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org


Re: Riddle me this: grep / regx experts

2018-02-02 Thread Patrick O'Callaghan
On Fri, 2018-02-02 at 12:32 -0500, R. G. Newbury wrote:
> > Thanks to all for the quick responses. I *tried* to RTFM but that was 
> 
> not clear, even on a re-read.  I took [0-9]* as multiple instances of 
> [0-9] but NOT zero instances..

From 'man grep':

Repetition
   A regular expression may be followed by one of several repetition 
operators:
   ?  The preceding item is optional and matched at most once.
   *  The preceding item will be matched zero or more times.
   +  The preceding item will be matched one or more times.
   {n}The preceding item is matched exactly n times.
   {n,}   The preceding item is matched n or more times.
   {,m}   The preceding item is matched at most m times.  This is a GNU 
extension.
   {n,m}  The preceding item is matched at least n times, but not more than 
m times.

poc
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org


Re: Riddle me this: grep / regx experts

2018-02-02 Thread Tim
Allegedly, on or about 2 February 2018, R. G. Newbury sent:
> I am cleaning up some html code, using sed to standardize the 
> formatting. I was searching for specific instances of code to amend 
> using grep.

In case you're not aware of it, there's a HTML tidy command that
neatens up HTML.

dnf install tidy

-- 
[tim@localhost ~]$ uname -rsvp
Linux 4.14.14-200.fc26.x86_64 #1 SMP Fri Jan 19 13:27:06 UTC 2018 x86_64

Boilerplate:  All mail to my mailbox is automatically deleted.
There is no point trying to privately email me, I only get to see
the messages posted to the mailing list.

ZNQR LBH YBBX!
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org


Re: Riddle me this: grep / regx experts

2018-02-02 Thread R. G. Newbury

On Fri, Feb 02, 2018 at 11:04:01AM -0500, R. G. Newbury wrote:
A bug in regx handling???

I am cleaning up some html code

.

# grep -h '[0-9]*s[0-9]*">' temp

>> Returns the example line with the 's[0-9]">' highlighted.


Can anyone explain what is happening?. This isn't politics so the group
[0-9] should not equal [0-9"#]. Or even [0-9\"\#].



.
Fri, 2 Feb 2018 10:14:37 -0600 From: Chris Adams  



A * in a regex is "0 or more of the previous", so basically you are just
matching 's[0-9]*">' (because there will always be at least 0 of the
[0-9] part at the start).

If you really mean "1 or more", you can use an extended regex (the -E
argument to grep/sed) and use + instead of *, so '[0-9]+s[0-9]*">'.

Fri, 02 Feb 2018 16:15:37 + From: Patrick O'Callaghan 
In grep, * matches any number of instances, including 0. You want to

use + rather than * to guarantee at least one digit.



Date: Fri, 2 Feb 2018 11:26:02 -0500 > From: Jon LaBadie



You are misunderstanding the "*".  It means any sequence of the
associated character including a ZERO length sequence.

So [0-9]*s matches "s (actually just the s) as is is a zero length
sequence of digits followed by an s.  When you grep for [0-9]s, there
must be at least one digit before the s (but any extra digits are not
part of the match).  Sometimes the sequence [0-9][0-9]*s is useful to
say "one or more digits before the s".

jl
Thanks to all for the quick responses. I *tried* to RTFM but that was 
not clear, even on a re-read.  I took [0-9]* as multiple instances of 
[0-9] but NOT zero instances..


Geoff
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org


Re: Riddle me this: grep / regx experts

2018-02-02 Thread Jon LaBadie
On Fri, Feb 02, 2018 at 11:04:01AM -0500, R. G. Newbury wrote:
> A bug in regx handling???
> 
> I am cleaning up some html code, using sed to standardize the formatting. I
> was searching for specific instances of code to amend using grep.
> I was looking for instances like  
> Example text in a file: ( here named, quite originally, temp )
> 8.
> 
> And # grep -h '[0-9]s[0-9]*">' temp
> Returns nothing  (which is the expected result: there are no [0-9]s[0-9}">
> instances.
> 
> BUT!!!
> # grep -h '[0-9]*s[0-9]*">' temp
> Returns the example line with the 's[0-9]">' highlighted.
> 
> Note that the character before the 's' is either " or #
> 
> Can anyone explain what is happening?. This isn't politics so the group
> [0-9] should not equal [0-9"#]. Or even [0-9\"\#].

You are misunderstanding the "*".  It means any sequence of the
associated character including a ZERO length sequence.

So [0-9]*s matches "s (actually just the s) as is is a zero length
sequence of digits followed by an s.  When you grep for [0-9]s, there
must be at least one digit before the s (but any extra digits are not
part of the match).  Sometimes the sequence [0-9][0-9]*s is useful to
say "one or more digits before the s".

jl
-- 
Jon H. LaBadie  jo...@jgcomp.com
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org


Re: Riddle me this: grep / regx experts

2018-02-02 Thread Patrick O'Callaghan
On Fri, 2018-02-02 at 11:04 -0500, R. G. Newbury wrote:
> # grep -h '[0-9]*s[0-9]*">' temp
> Returns the example line with the 's[0-9]">' highlighted.

In grep, * matches any number of instances, including 0. You want to
use + rather than * to guarantee at least one digit.

poc
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org


Re: Riddle me this: grep / regx experts

2018-02-02 Thread Chris Adams
Once upon a time, R. G. Newbury  said:
> # grep -h '[0-9]*s[0-9]*">' temp
> Returns the example line with the 's[0-9]">' highlighted.

A * in a regex is "0 or more of the previous", so basically you are just
matching 's[0-9]*">' (because there will always be at least 0 of the
[0-9] part at the start).

If you really mean "1 or more", you can use an extended regex (the -E
argument to grep/sed) and use + instead of *, so '[0-9]+s[0-9]*">'.

-- 
Chris Adams 
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org