Re: [gentoo-user] OT scripting - strip zero if between period and digit
Hello, On Wed, 23 Jan 2019, Adam Carter wrote: >> $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \ >> sed 's/0*\([[:digit:]]\+\)/\1/g' >> 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3 > >So [[:digit:]] is another way of writing [0-9] and the + just means another >instance of the proceeding expression, right, so your and Francois >solutions are functionally the same, and all the following are the same >too, right? Not quite. [[:digit:]]+ == [0-9][0-9]* Not too, that [:digit:] respects locale. Don't know where locale applies though. Probably Devanagari or such. I just made myself to use classes rather than char-ranges. See man 7 regex for more classes. HTH, -dnh -- "UNIX was not designed to stop people from doing stupid things, because that would also stop them from doing clever things." -- Doug Gwyn
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On 1/23/19 5:52 AM, Wols Lists wrote: I've just done a bit of digging, and would this work to match an octet? [0-9][0-9]?[0-9]? It doesn't match 0123. Regardless, using [0-9] is destined to fail because it will match things like 999 that also aren't an octet.
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On 23/01/19 07:37, Alexander Kapshuk wrote: > On Wed, Jan 23, 2019 at 9:05 AM Paul Colquhoun > wrote: >> >> On Wednesday, 23 January 2019 5:52:57 PM AEDT Alexander Kapshuk wrote: >>> On Wed, Jan 23, 2019 at 5:20 AM Adam Carter wrote: >> François-Xavier > > My bad, it should be: > > sed 's/0*\([0-9][0-9]*\)/\1/g' > > (tests are indeed needed!) Many thanks François. This is almost right, but it is also stripping zeros that follow a letter, and I only want it to strip zeros that are proceeded by a period. There are no leading zeros in the first octet of the IP so that case does not need to be handled. Does the \1 refer to what's in the ()'s? So anything that one would wont to carry through should be inside the ()'s and anything that's outside is stripped, right? >>> Would something like to do the trick? >>> echo 198.088.062.01 | sed 's/\.0/./g' >>> 198.88.62.1 >> >> In a word, no. >> >> echo 198.088.0.01 | sed 's/\.0/./g' >> 198.88..1 >> >> >> -- >> Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/ >> Asking for technical help in newsgroups? Read this first: >> http://catb.org/~esr/faqs/smart-questions.html#intro >> >> >> >> > > How about this one? > > echo '198.088.0.01 > 198.088.062.01' | sed 's/\.0\([0-9][0-9]*\)/.\1/g' > 198.88.0.1 > 198.88.62.1 > I've just done a bit of digging, and would this work to match an octet? [0-9][0-9]?[0-9]? I know ? normally matches a single character, but apparently in this syntax it means "0 or 1 occurrence of the preceding expression". So that will detect a number consisting of at most three digits. I thought there must be a "detect a single optional character" operator ... :-) Cheers, Wol
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On Wed, Jan 23, 2019 at 9:05 AM Paul Colquhoun wrote: > > On Wednesday, 23 January 2019 5:52:57 PM AEDT Alexander Kapshuk wrote: > > On Wed, Jan 23, 2019 at 5:20 AM Adam Carter wrote: > > >> > François-Xavier > > >> > > >> My bad, it should be: > > >> > > >> sed 's/0*\([0-9][0-9]*\)/\1/g' > > >> > > >> (tests are indeed needed!) > > > > > > Many thanks François. This is almost right, but it is also stripping zeros > > > that follow a letter, and I only want it to strip zeros that are > > > proceeded by a period. There are no leading zeros in the first octet of > > > the IP so that case does not need to be handled. > > > > > > Does the \1 refer to what's in the ()'s? So anything that one would wont > > > to carry through should be inside the ()'s and anything that's outside is > > > stripped, right? > > Would something like to do the trick? > > echo 198.088.062.01 | sed 's/\.0/./g' > > 198.88.62.1 > > In a word, no. > > echo 198.088.0.01 | sed 's/\.0/./g' > 198.88..1 > > > -- > Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/ > Asking for technical help in newsgroups? Read this first: > http://catb.org/~esr/faqs/smart-questions.html#intro > > > > How about this one? echo '198.088.0.01 198.088.062.01' | sed 's/\.0\([0-9][0-9]*\)/.\1/g' 198.88.0.1 198.88.62.1
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On Wednesday, 23 January 2019 5:52:57 PM AEDT Alexander Kapshuk wrote: > On Wed, Jan 23, 2019 at 5:20 AM Adam Carter wrote: > >> > François-Xavier > >> > >> My bad, it should be: > >> > >> sed 's/0*\([0-9][0-9]*\)/\1/g' > >> > >> (tests are indeed needed!) > > > > Many thanks François. This is almost right, but it is also stripping zeros > > that follow a letter, and I only want it to strip zeros that are > > proceeded by a period. There are no leading zeros in the first octet of > > the IP so that case does not need to be handled. > > > > Does the \1 refer to what's in the ()'s? So anything that one would wont > > to carry through should be inside the ()'s and anything that's outside is > > stripped, right? > Would something like to do the trick? > echo 198.088.062.01 | sed 's/\.0/./g' > 198.88.62.1 In a word, no. echo 198.088.0.01 | sed 's/\.0/./g' 198.88..1 -- Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/ Asking for technical help in newsgroups? Read this first: http://catb.org/~esr/faqs/smart-questions.html#intro
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On Wed, Jan 23, 2019 at 5:20 AM Adam Carter wrote: >> >> > François-Xavier >> > >> > >> >> My bad, it should be: >> >> sed 's/0*\([0-9][0-9]*\)/\1/g' >> >> (tests are indeed needed!) > > > Many thanks François. This is almost right, but it is also stripping zeros > that follow a letter, and I only want it to strip zeros that are proceeded by > a period. There are no leading zeros in the first octet of the IP so that > case does not need to be handled. > > Does the \1 refer to what's in the ()'s? So anything that one would wont to > carry through should be inside the ()'s and anything that's outside is > stripped, right? > > > Would something like to do the trick? echo 198.088.062.01 | sed 's/\.0/./g' 198.88.62.1
Re: [gentoo-user] OT scripting - strip zero if between period and digit
Le 23/01/2019 à 04:19, Adam Carter a écrit : > François-Xavier > > My bad, it should be: sed 's/0*\([0-9][0-9]*\)/\1/g' (tests are indeed needed!) Many thanks François. This is almost right, but it is also stripping zeros that follow a letter, and I only want it to strip zeros that are proceeded by a period. There are no leading zeros in the first octet of the IP so that case does not need to be handled. Does the \1 refer to what's in the ()'s? So anything that one would wont to carry through should be inside the ()'s and anything that's outside is stripped, right? Yes, \1 is the content in (). But adding letters inside won't solve the problem, eg. "a01" will still be changed to "a1". AFAIK, there is no way to express "start of line or a character" in sed, but you could do two regexps, one starting with ^ (start of line), the other with \. (dot) sed 's/^0*\([0-9][0-9]*\)/\1/g;s/\.0*\([0-9][0-9]*\)/.\1/g'
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On Wednesday, 23 January 2019 2:32:43 PM AEDT Adam Carter wrote: > > $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \ > > > > sed 's/0*\([[:digit:]]\+\)/\1/g' > > > > 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3 > > Hi David - thanks for that. > > So [[:digit:]] is another way of writing [0-9] and the + just means another > instance of the proceeding expression, right, so your and Francois > solutions are functionally the same, and all the following are the same > too, right? > > [[:digit:]]+ > [[:digit:]][[:digit:]] > [0-9]+ > [0-9][0-9] Not quite. A trailing '+' means "1 or more of the preceding item", while a trailing '*' means "0 or more". [0-9]+ would match any string consisting of only digits, no matter how long, but not an empty string. -- Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/ Asking for technical help in newsgroups? Read this first: http://catb.org/~esr/faqs/smart-questions.html#intro
Re: [gentoo-user] OT scripting - strip zero if between period and digit
> > $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \ > sed 's/0*\([[:digit:]]\+\)/\1/g' > 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3 > > > Hi David - thanks for that. So [[:digit:]] is another way of writing [0-9] and the + just means another instance of the proceeding expression, right, so your and Francois solutions are functionally the same, and all the following are the same too, right? [[:digit:]]+ [[:digit:]][[:digit:]] [0-9]+ [0-9][0-9]
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On Wed, Jan 23, 2019 at 12:34 AM Michael Orlitzky wrote: > On 1/21/19 9:55 PM, David Haller wrote: > > > > $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \ > > sed 's/0*\([[:digit:]]\+\)/\1/g' > > 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3 > > > > There are actually more than four examples that it needs to work on. And > more to the point, this is going to destroy any other numbers it finds > in the input. Phone numbers, zip codes, addresses, credit cards numbers, > timestamps, etc. will all get clobbered. It takes like 10 lines of > python to do this right; it's silly to invest a ton of effort trying to > come up with a regex solution that accidentally works. > > Thanks Michael. The input data is constrained in ways I didnt list, so it might be possible to get away with a regex, but I appreciate you highlighting the risk of what sounds like a brittle approach. I am hopeful that one day learning python will make it to the top of my priority list.
Re: [gentoo-user] OT scripting - strip zero if between period and digit
> > > François-Xavier > > > > > > My bad, it should be: > > sed 's/0*\([0-9][0-9]*\)/\1/g' > > (tests are indeed needed!) > Many thanks François. This is almost right, but it is also stripping zeros that follow a letter, and I only want it to strip zeros that are proceeded by a period. There are no leading zeros in the first octet of the IP so that case does not need to be handled. Does the \1 refer to what's in the ()'s? So anything that one would wont to carry through should be inside the ()'s and anything that's outside is stripped, right?
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On 1/21/19 9:55 PM, David Haller wrote: $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \ sed 's/0*\([[:digit:]]\+\)/\1/g' 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3 There are actually more than four examples that it needs to work on. And more to the point, this is going to destroy any other numbers it finds in the input. Phone numbers, zip codes, addresses, credit cards numbers, timestamps, etc. will all get clobbered. It takes like 10 lines of python to do this right; it's silly to invest a ton of effort trying to come up with a regex solution that accidentally works.
Re: [gentoo-user] OT scripting - strip zero if between period and digit
Hello, On Mon, 21 Jan 2019, Michael Orlitzky wrote: >On 1/21/19 6:50 PM, Adam Carter wrote: >> I need to clean up a file which has IP addresses with leading zeros in >> some of the octets so I need to make, say, .09 into .9 >> >> How do i do that in sed/awk/whatever? > >The first thing you should do is construct a bunch of test cases, with all of >the possible input representations and what you think the output >representation should be. Then, you should write a program in something other >than bash that passes all of the test cases. It's not as easy as it sounds; >for example: > > * What happens to 0.1.2.3? > > * What happens to 01.2.3.4? > > * What happens to 1.2.3.0? > > * What happens to 1.2.000.3? > >You need a parser, not a regular expression. (You can do it with a regex, but >it's going to be one of those comical twelve-page-long things.) $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \ sed 's/0*\([[:digit:]]\+\)/\1/g' 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3 HTH, -dnh -- printk(KERN_DEBUG "adintr: Why?\n"); linux-2.6.19/sound/oss/ad1848.c
Re: [gentoo-user] OT scripting - strip zero if between period and digit
Le 22/01/2019 à 03:05, François-Xavier CARTON a écrit : Le 22/01/2019 à 00:50, Adam Carter a écrit : I need to clean up a file which has IP addresses with leading zeros in some of the octets so I need to make, say, .09 into .9 How do i do that in sed/awk/whatever? I believe that should do: sed 's/0*\([0-9]\)/\1/g' eg. $ sed 's/0*\([0-9]\)/\1/g' < My bad, it should be: sed 's/0*\([0-9][0-9]*\)/\1/g' (tests are indeed needed!) François-Xavier
Re: [gentoo-user] OT scripting - strip zero if between period and digit
Le 22/01/2019 à 00:50, Adam Carter a écrit : I need to clean up a file which has IP addresses with leading zeros in some of the octets so I need to make, say, .09 into .9 How do i do that in sed/awk/whatever? I believe that should do: sed 's/0*\([0-9]\)/\1/g' eg. $ sed 's/0*\([0-9]\)/\1/g' <
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On 21/01/2019 18:50, Adam Carter wrote: > I need to clean up a file which has IP addresses with leading zeros in > some of the octets so I need to make, say, .09 into .9 > > How do i do that in sed/awk/whatever? A regex would be difficult. Parser is what you want. You could use Python's ipaddress module (Python 3.3+). It will fix your IPs (below is all one line): python -c $'import ipaddress, sys;\nfor x in sys.argv[1:]: print(ipaddress.ip_address(x))' 1.02.3.4 001.002.003.004 Output: 1.2.3.4 1.2.3.4 Fix that for stdin: python -c $'import ipaddress, sys;\nfor x in sys.stdin.readlines(): print(ipaddress.ip_address(x.strip()))' <<< $'1.02.3.4\n001.002.003.004' That way you can do: python -c $'import ipaddress, sys;\nfor x in sys.stdin.readlines(): print(ipaddress.ip_address(x.strip()))' < list-of-ip-addresses I'm sure there's a nicer way with modules installed with other languages but this is built into Python as of version 3.3. Andrew signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On 1/21/19 5:02 PM, Michael Orlitzky wrote: You need a parser, not a regular expression. The first thing that came to mind is splitting the values and passing them through printf. (You can do it with a regex, but it's going to be one of those comical twelve-page-long things.) I don't know about 12 pages. But, yes, a regular expression that takes all the possible cases into account, especially as the four octet IP, will be … complicated. A regular expression to work on an individual octet might be less complicated. You can play with REs fairly easily via sed.
Re: [gentoo-user] OT scripting - strip zero if between period and digit
On 1/21/19 6:50 PM, Adam Carter wrote: I need to clean up a file which has IP addresses with leading zeros in some of the octets so I need to make, say, .09 into .9 How do i do that in sed/awk/whatever? The first thing you should do is construct a bunch of test cases, with all of the possible input representations and what you think the output representation should be. Then, you should write a program in something other than bash that passes all of the test cases. It's not as easy as it sounds; for example: * What happens to 0.1.2.3? * What happens to 01.2.3.4? * What happens to 1.2.3.0? * What happens to 1.2.000.3? You need a parser, not a regular expression. (You can do it with a regex, but it's going to be one of those comical twelve-page-long things.)
[gentoo-user] OT scripting - strip zero if between period and digit
I need to clean up a file which has IP addresses with leading zeros in some of the octets so I need to make, say, .09 into .9 How do i do that in sed/awk/whatever?