Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-23 Thread David Haller
Hello,

On Wed, 23 Jan 2019, Adam Carter wrote:
>> $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
>> sed 's/0*\([[:digit:]]\+\)/\1/g'
>> 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3
>
>So [[:digit:]] is another way of writing [0-9] and the + just means another
>instance of the proceeding expression, right, so your and Francois
>solutions are functionally the same, and all the following are the same
>too, right?

Not quite.

[[:digit:]]+ == [0-9][0-9]*

Not too, that [:digit:] respects locale. Don't know where locale
applies though. Probably Devanagari or such. I just made myself to use
classes rather than char-ranges. See man 7 regex for more classes.

HTH,
-dnh

-- 
"UNIX was not designed to stop people from doing stupid things, because
 that would also stop them from doing clever things."  -- Doug Gwyn



Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-23 Thread Michael Orlitzky

On 1/23/19 5:52 AM, Wols Lists wrote:


I've just done a bit of digging, and would this work to match an octet?

[0-9][0-9]?[0-9]?



It doesn't match 0123. Regardless, using [0-9] is destined to fail 
because it will match things like 999 that also aren't an octet.




Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-23 Thread Wols Lists
On 23/01/19 07:37, Alexander Kapshuk wrote:
> On Wed, Jan 23, 2019 at 9:05 AM Paul Colquhoun
>  wrote:
>>
>> On Wednesday, 23 January 2019 5:52:57 PM AEDT Alexander Kapshuk wrote:
>>> On Wed, Jan 23, 2019 at 5:20 AM Adam Carter  wrote:
>> François-Xavier
>
> My bad, it should be:
>
> sed 's/0*\([0-9][0-9]*\)/\1/g'
>
> (tests are indeed needed!)

 Many thanks François. This is almost right, but it is also stripping zeros
 that follow a letter, and I only want it to strip zeros that are
 proceeded by a period. There are no leading zeros in the first octet of
 the IP so that case does not need to be handled.

 Does the \1 refer to what's in the ()'s? So anything that one would wont
 to carry through should be inside the ()'s and anything that's outside is
 stripped, right?
>>> Would something like to do the trick?
>>> echo 198.088.062.01 | sed 's/\.0/./g'
>>> 198.88.62.1
>>
>> In a word, no.
>>
>> echo 198.088.0.01 | sed 's/\.0/./g'
>> 198.88..1
>>
>>
>> --
>> Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/
>>   Asking for technical help in newsgroups?  Read this first:
>>  http://catb.org/~esr/faqs/smart-questions.html#intro
>>
>>
>>
>>
> 
> How about this one?
> 
> echo '198.088.0.01
> 198.088.062.01' | sed 's/\.0\([0-9][0-9]*\)/.\1/g'
> 198.88.0.1
> 198.88.62.1
> 

I've just done a bit of digging, and would this work to match an octet?

[0-9][0-9]?[0-9]?

I know ? normally matches a single character, but apparently in this
syntax it means "0 or 1 occurrence of the preceding expression". So that
will detect a number consisting of at most three digits.

I thought there must be a "detect a single optional character" operator
... :-)

Cheers,
Wol



Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-22 Thread Alexander Kapshuk
On Wed, Jan 23, 2019 at 9:05 AM Paul Colquhoun
 wrote:
>
> On Wednesday, 23 January 2019 5:52:57 PM AEDT Alexander Kapshuk wrote:
> > On Wed, Jan 23, 2019 at 5:20 AM Adam Carter  wrote:
> > >> > François-Xavier
> > >>
> > >> My bad, it should be:
> > >>
> > >> sed 's/0*\([0-9][0-9]*\)/\1/g'
> > >>
> > >> (tests are indeed needed!)
> > >
> > > Many thanks François. This is almost right, but it is also stripping zeros
> > > that follow a letter, and I only want it to strip zeros that are
> > > proceeded by a period. There are no leading zeros in the first octet of
> > > the IP so that case does not need to be handled.
> > >
> > > Does the \1 refer to what's in the ()'s? So anything that one would wont
> > > to carry through should be inside the ()'s and anything that's outside is
> > > stripped, right?
> > Would something like to do the trick?
> > echo 198.088.062.01 | sed 's/\.0/./g'
> > 198.88.62.1
>
> In a word, no.
>
> echo 198.088.0.01 | sed 's/\.0/./g'
> 198.88..1
>
>
> --
> Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/
>   Asking for technical help in newsgroups?  Read this first:
>  http://catb.org/~esr/faqs/smart-questions.html#intro
>
>
>
>

How about this one?

echo '198.088.0.01
198.088.062.01' | sed 's/\.0\([0-9][0-9]*\)/.\1/g'
198.88.0.1
198.88.62.1



Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-22 Thread Paul Colquhoun
On Wednesday, 23 January 2019 5:52:57 PM AEDT Alexander Kapshuk wrote:
> On Wed, Jan 23, 2019 at 5:20 AM Adam Carter  wrote:
> >> > François-Xavier
> >> 
> >> My bad, it should be:
> >> 
> >> sed 's/0*\([0-9][0-9]*\)/\1/g'
> >> 
> >> (tests are indeed needed!)
> > 
> > Many thanks François. This is almost right, but it is also stripping zeros
> > that follow a letter, and I only want it to strip zeros that are
> > proceeded by a period. There are no leading zeros in the first octet of
> > the IP so that case does not need to be handled.
> > 
> > Does the \1 refer to what's in the ()'s? So anything that one would wont
> > to carry through should be inside the ()'s and anything that's outside is
> > stripped, right?
> Would something like to do the trick?
> echo 198.088.062.01 | sed 's/\.0/./g'
> 198.88.62.1

In a word, no.

echo 198.088.0.01 | sed 's/\.0/./g'
198.88..1


-- 
Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/
  Asking for technical help in newsgroups?  Read this first:
 http://catb.org/~esr/faqs/smart-questions.html#intro






Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-22 Thread Alexander Kapshuk
On Wed, Jan 23, 2019 at 5:20 AM Adam Carter  wrote:
>>
>> > François-Xavier
>> >
>> >
>>
>> My bad, it should be:
>>
>> sed 's/0*\([0-9][0-9]*\)/\1/g'
>>
>> (tests are indeed needed!)
>
>
> Many thanks François. This is almost right, but it is also stripping zeros 
> that follow a letter, and I only want it to strip zeros that are proceeded by 
> a period. There are no leading zeros in the first octet of the IP so that 
> case does not need to be handled.
>
> Does the \1 refer to what's in the ()'s? So anything that one would wont to 
> carry through should be inside the ()'s and anything that's outside is 
> stripped, right?
>
>
>

Would something like to do the trick?
echo 198.088.062.01 | sed 's/\.0/./g'
198.88.62.1



Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-22 Thread François-Xavier CARTON

Le 23/01/2019 à 04:19, Adam Carter a écrit :

 > François-Xavier
 >
 >

My bad, it should be:

sed 's/0*\([0-9][0-9]*\)/\1/g'

(tests are indeed needed!)


Many thanks François. This is almost right, but it is also stripping 
zeros that follow a letter, and I only want it to strip zeros that are 
proceeded by a period. There are no leading zeros in the first octet of 
the IP so that case does not need to be handled.


Does the \1 refer to what's in the ()'s? So anything that one would wont 
to carry through should be inside the ()'s and anything that's outside 
is stripped, right?






Yes, \1 is the content in (). But adding letters inside won't solve the 
problem, eg. "a01" will still be changed to "a1".


AFAIK, there is no way to express "start of line or a character" in sed, 
but you could do two regexps, one starting with ^ (start of line), the 
other with \. (dot)



sed 's/^0*\([0-9][0-9]*\)/\1/g;s/\.0*\([0-9][0-9]*\)/.\1/g'



Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-22 Thread Paul Colquhoun
On Wednesday, 23 January 2019 2:32:43 PM AEDT Adam Carter wrote:
> > $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
> > 
> > sed 's/0*\([[:digit:]]\+\)/\1/g'
> > 
> > 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3
> 
> Hi David - thanks for that.
> 
> So [[:digit:]] is another way of writing [0-9] and the + just means another
> instance of the proceeding expression, right, so your and Francois
> solutions are functionally the same, and all the following are the same
> too, right?
> 
> [[:digit:]]+
> [[:digit:]][[:digit:]]
> [0-9]+
> [0-9][0-9]


Not quite.

A trailing '+' means "1 or more of the preceding item", while a trailing '*' 
means "0 or more".

[0-9]+   would match any string consisting of only digits, no matter how long, 
but not an empty string.


-- 
Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/
  Asking for technical help in newsgroups?  Read this first:
 http://catb.org/~esr/faqs/smart-questions.html#intro






Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-22 Thread Adam Carter
>
> $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
> sed 's/0*\([[:digit:]]\+\)/\1/g'
> 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3
>
>
>
Hi David - thanks for that.

So [[:digit:]] is another way of writing [0-9] and the + just means another
instance of the proceeding expression, right, so your and Francois
solutions are functionally the same, and all the following are the same
too, right?

[[:digit:]]+
[[:digit:]][[:digit:]]
[0-9]+
[0-9][0-9]


Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-22 Thread Adam Carter
On Wed, Jan 23, 2019 at 12:34 AM Michael Orlitzky  wrote:

> On 1/21/19 9:55 PM, David Haller wrote:
> >
> > $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
> >  sed 's/0*\([[:digit:]]\+\)/\1/g'
> > 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3
> >
>
> There are actually more than four examples that it needs to work on. And
> more to the point, this is going to destroy any other numbers it finds
> in the input. Phone numbers, zip codes, addresses, credit cards numbers,
> timestamps, etc. will all get clobbered. It takes like 10 lines of
> python to do this right; it's silly to invest a ton of effort trying to
> come up with a regex solution that accidentally works.
>
>
Thanks Michael. The input data is constrained in ways I didnt list, so it
might be possible to get away with a regex, but I appreciate you
highlighting the risk of what sounds like a brittle approach.

I am hopeful that one day learning python will make it to the top of my
priority list.


Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-22 Thread Adam Carter
>
> > François-Xavier
> >
> >
>
> My bad, it should be:
>
> sed 's/0*\([0-9][0-9]*\)/\1/g'
>
> (tests are indeed needed!)
>

Many thanks François. This is almost right, but it is also stripping zeros
that follow a letter, and I only want it to strip zeros that are proceeded
by a period. There are no leading zeros in the first octet of the IP so
that case does not need to be handled.

Does the \1 refer to what's in the ()'s? So anything that one would wont to
carry through should be inside the ()'s and anything that's outside is
stripped, right?


Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-22 Thread Michael Orlitzky

On 1/21/19 9:55 PM, David Haller wrote:


$ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
 sed 's/0*\([[:digit:]]\+\)/\1/g'
0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3



There are actually more than four examples that it needs to work on. And 
more to the point, this is going to destroy any other numbers it finds 
in the input. Phone numbers, zip codes, addresses, credit cards numbers, 
timestamps, etc. will all get clobbered. It takes like 10 lines of 
python to do this right; it's silly to invest a ton of effort trying to 
come up with a regex solution that accidentally works.




Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-22 Thread David Haller
Hello,

On Mon, 21 Jan 2019, Michael Orlitzky wrote:
>On 1/21/19 6:50 PM, Adam Carter wrote:
>> I need to clean up a file which has IP addresses with leading zeros in
>> some of the octets so I need to make, say, .09 into .9
>> 
>> How do i do that in sed/awk/whatever?
>
>The first thing you should do is construct a bunch of test cases, with all of
>the possible input representations and what you think the output
>representation should be. Then, you should write a program in something other
>than bash that passes all of the test cases. It's not as easy as it sounds;
>for example:
>
>  * What happens to 0.1.2.3?
>
>  * What happens to 01.2.3.4?
>
>  * What happens to 1.2.3.0?
>
>  * What happens to 1.2.000.3?
>
>You need a parser, not a regular expression. (You can do it with a regex, but
>it's going to be one of those comical twelve-page-long things.)

$ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
sed 's/0*\([[:digit:]]\+\)/\1/g'
0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3

HTH,
-dnh

-- 
printk(KERN_DEBUG "adintr: Why?\n");
linux-2.6.19/sound/oss/ad1848.c



Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-21 Thread François-Xavier CARTON

Le 22/01/2019 à 03:05, François-Xavier CARTON a écrit :

Le 22/01/2019 à 00:50, Adam Carter a écrit :
I need to clean up a file which has IP addresses with leading zeros in 
some of the octets so I need to make, say, .09 into .9


How do i do that in sed/awk/whatever?



I believe that should do:

sed 's/0*\([0-9]\)/\1/g'

eg.

$ sed 's/0*\([0-9]\)/\1/g' <

My bad, it should be:

sed 's/0*\([0-9][0-9]*\)/\1/g'

(tests are indeed needed!)

François-Xavier



Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-21 Thread François-Xavier CARTON

Le 22/01/2019 à 00:50, Adam Carter a écrit :
I need to clean up a file which has IP addresses with leading zeros in 
some of the octets so I need to make, say, .09 into .9


How do i do that in sed/awk/whatever?



I believe that should do:

sed 's/0*\([0-9]\)/\1/g'

eg.

$ sed 's/0*\([0-9]\)/\1/g' <

Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-21 Thread Andrew Udvare
On 21/01/2019 18:50, Adam Carter wrote:
> I need to clean up a file which has IP addresses with leading zeros in
> some of the octets so I need to make, say, .09 into .9
> 
> How do i do that in sed/awk/whatever?

A regex would be difficult. Parser is what you want.

You could use Python's ipaddress module (Python 3.3+). It will fix your
IPs (below is all one line):

python -c $'import ipaddress, sys;\nfor x in sys.argv[1:]:
print(ipaddress.ip_address(x))' 1.02.3.4 001.002.003.004

Output:
1.2.3.4
1.2.3.4

Fix that for stdin:

python -c $'import ipaddress, sys;\nfor x in sys.stdin.readlines():
print(ipaddress.ip_address(x.strip()))' <<< $'1.02.3.4\n001.002.003.004'

That way you can do:

python -c $'import ipaddress, sys;\nfor x in sys.stdin.readlines():
print(ipaddress.ip_address(x.strip()))' < list-of-ip-addresses

I'm sure there's a nicer way with modules installed with other languages
but this is built into Python as of version 3.3.

Andrew



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-21 Thread Grant Taylor

On 1/21/19 5:02 PM, Michael Orlitzky wrote:

You need a parser, not a regular expression.


The first thing that came to mind is splitting the values and passing 
them through printf.


(You can do it with a regex, but it's going to be one of those comical 
twelve-page-long things.)


I don't know about 12 pages.  But, yes, a regular expression that takes 
all the possible cases into account, especially as the four octet IP, 
will be … complicated.  A regular expression to work on an individual 
octet might be less complicated.


You can play with REs fairly easily via sed.



Re: [gentoo-user] OT scripting - strip zero if between period and digit

2019-01-21 Thread Michael Orlitzky

On 1/21/19 6:50 PM, Adam Carter wrote:
I need to clean up a file which has IP addresses with leading zeros in 
some of the octets so I need to make, say, .09 into .9


How do i do that in sed/awk/whatever?


The first thing you should do is construct a bunch of test cases, with 
all of the possible input representations and what you think the output 
representation should be. Then, you should write a program in something 
other than bash that passes all of the test cases. It's not as easy as 
it sounds; for example:


  * What happens to 0.1.2.3?

  * What happens to 01.2.3.4?

  * What happens to 1.2.3.0?

  * What happens to 1.2.000.3?

You need a parser, not a regular expression. (You can do it with a 
regex, but it's going to be one of those comical twelve-page-long things.)




[gentoo-user] OT scripting - strip zero if between period and digit

2019-01-21 Thread Adam Carter
I need to clean up a file which has IP addresses with leading zeros in some
of the octets so I need to make, say, .09 into .9

How do i do that in sed/awk/whatever?