Re: Python re to separate some data values

2021-04-28 Thread Joshua Judson Rosen

On 4/28/21 7:01 PM, Bruce Labitt wrote:
> On 4/28/21 6:28 PM, Joshua Judson Rosen wrote:
>>> re.search('(\.)\d{3,3}', r1[1]) returns
>>>  so it found the first instance.
>>>
>>> But, re.sub('(\.)\d{3,3}', '(\.)\d{3,3}, ', r1[1]) yields a KeyError:
>>> '\\d' (Python3.8).  Get bad escape \d at position 4.
>> The second argument [the replacement string] to re.sub(pattern, repl, 
>> string) is not supposed to
>> just be a variation of the pattern-matching string that you passed as the 
>> first argument.
>>
>> I think the best illustration that I can give here is to just fix this up 
>> for you:
>>
>>  re.sub(r'(\.)(\d{3,3})', r'\1\2, ', r1[1])
>>
> Thanks for the embarrassingly concise answer.  It is greatly 
> appreciated.  Can you explain the syntax of the 2nd argument?  I haven't 
> seen that before.  Where can I find further examples?
> 
> What astounds me is re.search allowed my 1st argument, but re.sub barfed 
> all over the same 1st argument.

Actually re.search also accepted your first argument just fine.
It was your _second_ argument that it barfed all over,
because your match didn't produce a "matched character group #d",
it only produced a "matched character group #1"
(IIRC Python's RE documentation generally just calls them "groups").

Note that I added a second set of parentheses to your _pattern_
so that you now have also a group #2.

I was trying to make the smallest change possible to your pattern,
but this also would work fine:

re.sub(r'(\.\d{3,3})', r'\1, ', r1[1])


The "\1" (and "\2", in the previous example) are "references",
and are actually explained in an OK-ish way in the online Python library 
manual's
section for re:

https://docs.python.org/3/library/re.html

(there are also a few other backreference syntaxes that you can use in Python,
 so that you can give non-numeric names to them or just avoid ambiguities like
 whether "\20" means `group #2 and then a literal "0"' or `group #20'...).

-- 
Connect with me on the GNU social network! 

Not on the network? Ask me for more info!
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Python re to separate some data values

2021-04-28 Thread Bruce Labitt
On 4/28/21 6:35 PM, Henry Gessau wrote:
> On 4/28/2021 17:57, Bruce Labitt wrote:
>> I've looked in https://www.w3schools.com/python/python_regex.asp,
>> https://docs.python.org/3/library/re.html,
>> https://docs.python.org/3.8/howto/regex.html,
>> https://www.guru99.com/python-regular-expressions-complete-tutorial.html#2,
>> https://www.makeuseof.com/regular-expressions-python/, and
>> https://www.dataquest.io/blog/regular-expressions-data-scientists/ and
>> https://realpython.com/regex-python/
> You've missed the best site of all for regexes: https://regex101.com
> Indispensable for developing and debugging regexes.
>
> Here is my 2-minute attempt: https://regex101.com/r/jGu82j/1
> ___
> gnhlug-discuss mailing list
> gnhlug-discuss@mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
>
How does that site work?  I went there but there wasn't any apparent 
instructions or guidelines what to do?  It's not clear to me that what's 
on that site directly maps to re in python.  Still, it is an interesting 
concept.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Python re to separate some data values

2021-04-28 Thread Bruce Labitt
On 4/28/21 6:28 PM, Joshua Judson Rosen wrote:
> On 4/28/21 5:57 PM, Bruce Labitt wrote:
>> If someone could suggest how to do this, I'd appreciate it.  I've
>> scraped a table of fine thread metric screw parameters from a website.
>> I'm having some trouble with regex (re) separating the numbers.  Have
>> everything working save for this last bit.
>>
>> Here is a sample string:
>>
>> r1[1] = ' 17.98017.87417.65517.59917.43917.291'
>>
>> I'm trying to separate the numbers.  It should read like this:
>>
>> 17.980, 17.874, 17.655, 17.599, 17.439, 17.291
>>
>> There's more than 200 lines of this, so it would be great to automate
>> it!  Each number has 3 digits of precision, so I want to add a comma and
>> a space after the third digit.
>>
>> re.search('(\.)\d{3,3}', r1[1]) returns
>>  so it found the first instance.
>>
>> But, re.sub('(\.)\d{3,3}', '(\.)\d{3,3}, ', r1[1]) yields a KeyError:
>> '\\d' (Python3.8).  Get bad escape \d at position 4.
> The second argument [the replacement string] to re.sub(pattern, repl, string) 
> is not supposed to
> just be a variation of the pattern-matching string that you passed as the 
> first argument.
>
> I think the best illustration that I can give here is to just fix this up for 
> you:
>
>   re.sub(r'(\.)(\d{3,3})', r'\1\2, ', r1[1])
>
Thanks for the embarrassingly concise answer.  It is greatly 
appreciated.  Can you explain the syntax of the 2nd argument?  I haven't 
seen that before.  Where can I find further examples?

What astounds me is re.search allowed my 1st argument, but re.sub barfed 
all over the same 1st argument.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Python re to separate some data values

2021-04-28 Thread Henry Gessau
On 4/28/2021 17:57, Bruce Labitt wrote:
> I've looked in https://www.w3schools.com/python/python_regex.asp, 
> https://docs.python.org/3/library/re.html, 
> https://docs.python.org/3.8/howto/regex.html, 
> https://www.guru99.com/python-regular-expressions-complete-tutorial.html#2, 
> https://www.makeuseof.com/regular-expressions-python/, and 
> https://www.dataquest.io/blog/regular-expressions-data-scientists/ and 
> https://realpython.com/regex-python/

You've missed the best site of all for regexes: https://regex101.com
Indispensable for developing and debugging regexes.

Here is my 2-minute attempt: https://regex101.com/r/jGu82j/1
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Python re to separate some data values

2021-04-28 Thread Joshua Judson Rosen
On 4/28/21 5:57 PM, Bruce Labitt wrote:
> If someone could suggest how to do this, I'd appreciate it.  I've 
> scraped a table of fine thread metric screw parameters from a website.  
> I'm having some trouble with regex (re) separating the numbers.  Have 
> everything working save for this last bit.
> 
> Here is a sample string:
> 
> r1[1] = ' 17.98017.87417.65517.59917.43917.291'
> 
> I'm trying to separate the numbers.  It should read like this:
> 
> 17.980, 17.874, 17.655, 17.599, 17.439, 17.291
> 
> There's more than 200 lines of this, so it would be great to automate 
> it!  Each number has 3 digits of precision, so I want to add a comma and 
> a space after the third digit.
> 
> re.search('(\.)\d{3,3}', r1[1]) returns
>  so it found the first instance.
> 
> But, re.sub('(\.)\d{3,3}', '(\.)\d{3,3}, ', r1[1]) yields a KeyError: 
> '\\d' (Python3.8).  Get bad escape \d at position 4.
The second argument [the replacement string] to re.sub(pattern, repl, string) 
is not supposed to
just be a variation of the pattern-matching string that you passed as the first 
argument.

I think the best illustration that I can give here is to just fix this up for 
you:

re.sub(r'(\.)(\d{3,3})', r'\1\2, ', r1[1])

-- 
Connect with me on the GNU social network! 

Not on the network? Ask me for more info!
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/