RE: How to escape strings for re.finditer?

2023-02-28 Thread avi.e.gross
Peter,

Nobody here would appreciate it if I tested it by sending out multiple
copies of each email to see if the same message wraps differently.

I am using a fairly standard mailer in Outlook that interfaces with gmail
and I could try mailing directly from gmail but apparently there are
systemic problems and I experience other complaints when sending directly
from AOL mail too. 

So, if some people don't read me, I can live with that. I mean the right
people, LOL!

Or did I get that wrong?

I do appreciate the feedback. Ironically, when I politely shared how someone
else's email was displaying on my screen, it seems I am equally causing
similar issues for others.

An interesting question is whether any of us reading the archived copies see
different things including with various browsers:

https://mail.python.org/pipermail/python-list/

I am not sure which letters from me had the anomalies you mention but
spot-checking a few of them showed a normal display when I use Chrome.

But none of this is really a python issue except insofar as you never know
what functionality in the network was written for in python.

-Original Message-
From: Python-list  On
Behalf Of Peter J. Holzer
Sent: Tuesday, February 28, 2023 7:26 PM
To: python-list@python.org
Subject: Re: How to escape strings for re.finditer?

On 2023-03-01 01:01:42 +0100, Peter J. Holzer wrote:
> On 2023-02-28 15:25:05 -0500, avi.e.gr...@gmail.com wrote:
> > It happens to be easy for me to fix but I sometimes see garbled code 
> > I then simply ignore.
> 
> Truth to be told, that's one reason why I rarely read your mails to 
> the end. The long lines and the triple-spaced paragraphs make it just 
> too uncomfortable.

Hmm, since I was now paying a bit more attention to formatting problems I
saw that only about half of your messages have those long lines although all
seem to be sent with the same mailer. Don't know what's going on there.

hp


-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-28 Thread Chris Angelico
On Wed, 1 Mar 2023 at 10:04, Barry  wrote:
>
> > Though it's still probably not as useful as you might hope. In C, if I
> > can do "int id = counter++;" atomically, it would guarantee me a new
> > ID that no other thread could ever have.
>
> C does not have to do that atomically. In fact it is free to use lots of 
> instructions to build the int value. And some compilers indeed do, the linux 
> kernel folks see this in gcc generated code.
>
> I understand you have to use the new atomics features.
>

Yeah, I didn't have a good analogy so I went with a hypothetical.  The
atomicity would be more useful in that context as it would give
lock-free ID generation, which doesn't work in Python.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Weatherby,Gerard
Regex is fine if it works for you. The critiques – “difficult to read” –are 
subjective. Unless the code is in a section that has been profiled to be a 
bottleneck, I don’t sweat performance at this level.

For me, using code that has already been written and vetted is the preferred 
approach to writing new code I have to test and maintain. I use an online regex 
tester, https://pythex.org, to get the syntax write before copying pasting it 
into my code.

From: Python-list  on 
behalf of Jen Kris via Python-list 
Date: Tuesday, February 28, 2023 at 1:11 PM
To: Thomas Passin 
Cc: python-list@python.org 
Subject: Re: How to escape strings for re.finditer?
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

Using str.startswith is a cool idea in this case.  But is it better than regex 
for performance or reliability?  Regex syntax is not a model of simplicity, but 
in my simple case it's not too difficult.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Peter J. Holzer
On 2023-03-01 01:01:42 +0100, Peter J. Holzer wrote:
> On 2023-02-28 15:25:05 -0500, avi.e.gr...@gmail.com wrote:
> > It happens to be easy for me to fix but I sometimes see garbled code I
> > then simply ignore.
> 
> Truth to be told, that's one reason why I rarely read your mails to the
> end. The long lines and the triple-spaced paragraphs make it just too
> uncomfortable.

Hmm, since I was now paying a bit more attention to formatting problems
I saw that only about half of your messages have those long lines
although all seem to be sent with the same mailer. Don't know what's
going on there.

hp


-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Cryptic software announcements (was: ANN: DIPY 1.6.0)

2023-02-28 Thread Peter J. Holzer
[This isn't specifically about DIPY, I've noticed the same thing in
other announcements]

On 2023-02-28 13:48:56 -0500, Eleftherios Garyfallidis wrote:
> Hello all,
> 
> 
> We are excited to announce a new release of DIPY: DIPY 1.6.0 is out from
> the oven!

That's nice, but what is DIPY?


> In addition, registration for the oceanic DIPY workshop 2023 (April 24-28)
> is now open! Our comprehensive program is designed to equip you with the
> skills and knowledge needed to master the latest techniques and tools in
> structural and diffusion imaging.

Ok, so since the workshop is about ".., tools in structural and
diffusion imaging", DIPY is probably such a tool.

However, without this incidental announcement I wouldn't have any idea
what it is or if it would be worth my time clicking at any of the links.


I think it would be a good idea if software announcements would include
a single paragraph (or maybe just a single sentence) summarizing what
the software is and does.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Peter J. Holzer
On 2023-02-28 15:25:05 -0500, avi.e.gr...@gmail.com wrote:
> Jen,
> 
>  
> 
> I had no doubt the code you ran was indented properly or it would not work.
> 
>  
> 
> I am merely letting you know that somewhere in the process of copying
> the code or the transition between mailers, my version is messed up.

The problem seems to be at your end. Jen's code looks ok here.

The content type is text/plain, no format=flowed or anything which would
affect the interpretation of line endings. However, after
base64-decoding it only contains unix-style LF line endings, not CRLF
line endings. That might throw your mailer off, but I have no idea why
it would join only some lines but not others.

> It happens to be easy for me to fix but I sometimes see garbled code I
> then simply ignore.

Truth to be told, that's one reason why I rarely read your mails to the
end. The long lines and the triple-spaced paragraphs make it just too
uncomfortable.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-28 Thread Barry
> Though it's still probably not as useful as you might hope. In C, if I
> can do "int id = counter++;" atomically, it would guarantee me a new
> ID that no other thread could ever have.

C does not have to do that atomically. In fact it is free to use lots of 
instructions to build the int value. And some compilers indeed do, the linux 
kernel folks see this in gcc generated code.

I understand you have to use the new atomics features.

Barry


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Cameron Simpson

On 28Feb2023 18:57, Jen Kris  wrote:
One question:  several people have made suggestions other than regex 
(not your terser example with regex you shown below).  Is there a 
reason why regex is not preferred to, for example, a list comp?  


These are different things; I'm not sure a comparison is meaningful.


Performance?  Reliability? 


Regexps are:
- cryptic and error prone (you can make them more readable, but the 
  notation is deliberately both terse and powerful, which means that 
  small changes can have large effects in behaviour); the "error prone" 
  part does not mean that a regexp is unreliable, but that writing one 
  which is _correct_ for your task can be difficult, and also difficult 
  to debug

- have a compile step, which slows things down
- can be slower to execute as well, as a regexp does a bunch of 
  housekeeping for you


The more complex the tool the more... indirection between your solution 
using that tool and the smallest thing which needs to be done, and often 
the slower the solution. This isn't absolute;  there are times for the 
complex tool.


Common opinion here is often that if you're doing simple fixed-string 
things such as your task, which was finding instances of a fixed string, 
just use the existing str methods. You'll end up writing what you need 
directly and overtly.


I've a personal maxim that one should use the "smallest" tool which 
succinctly solves the problem. I usually use it to choose a programming 
language (eg sed vs awk vs shell vs python in loose order of problem 
difficulty), but it applies also to choosing tools within a language.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: Python 3.10 Fizzbuzz

2023-02-28 Thread Oscar Benjamin
On Tue, 28 Feb 2023 at 20:55, Mats Wichmann  wrote:
>
> On 2/27/23 16:42, Oscar Benjamin wrote:
> > On Mon, 27 Feb 2023 at 21:06, Ethan Furman  wrote:
> >>
> >> On 2/27/23 12:20, rbowman wrote:
> >>
> >>   > "By using Black, you agree to cede control over minutiae of hand-
> >>   > formatting. In return, Black gives you speed, determinism, and freedom
> >>   > from pycodestyle nagging about formatting. You will save time and 
> >> mental
> >>   > energy for more important matters."
> >>   >
> >>   > Somehow I don't think we would get along very well. I'm a little on the
> >>   > opinionated side myself.
> >>
> >> I personally cannot stand Black.  It feels like every major choice it 
> >> makes (and some minor ones) are exactly the
> >> opposite of the choice I make.
> >
> > I agree partially. There are two types of decisions black makes:
> >
> > 1. Leave the code alone because it seems okay or make small modifications.
> > 2. Reformat the code because it violates some generic rule (like line
> > too long or something).
> >
> > I've recently tried Black and mostly for my code it seems to go with 1
> > (code looks okay). There might be some minor changes like double vs
> > single quotes but I really don't care about those. In that sense me
> > and Black seem to agree.
> >
> > However I have also reviewed code where it is clear that the author
> > has used black and their code came under case 2. In that case Black
> > seems to produce awful things. What I can't understand is someone
> > accepting the awful rewrite rather than just fixing the code. Treating
> > Black almost like a linter makes sense to me but accepting the
> > rewrites that it offers for bad code does not.
>
> The amount of disagreement you see here and elsewhere are exactly why
> Black  is like it is - virtually without options.  It doesn't aim to
> solve the challenge of producing The Most Beautiful Code Layout, for
> *you*, or even for any moderately sized group of programmers.  Instead
> it's to remove the bickering:
> 1. we choose to use black for a project.
> 2. black runs automatically
> 3. there is now no need to spend cycles thinking about code-style
> aspects in reviews, or when we're making changes, because black makes
> sure the code aligns with the chosen style (1).

The problem is that although Black runs automatically it doesn't solve
the code problems automatically. Instead it takes something
questionable and produces something worse. If Black just rejected the
author's code and told them to write something better then they
probably would produce something better than what Black produces.

The limitation of Black is that it only reformats but usually at the
point when it does that the option of reformatting is not really the
thing that needs doing. Instead the right option is something like
introducing a new variable to split one statement into two but Black
just goes ahead and reformats without considering that option.

I'm fine with not arguing about what kinds of quotes to use but that
doesn't mean that I'll accept any output from Black without arguing
about the code being improved.

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin

On 2/28/2023 2:40 PM, David Raymond wrote:

With a slight tweak to the simple loop code using .find() it becomes a third 
faster than the RE version though.


def using_simple_loop2(key, text):
 matches = []
 keyLen = len(key)
 start = 0
 while (foundSpot := text.find(key, start)) > -1:
 start = foundSpot + keyLen
 matches.append((foundSpot, start))
 return matches


using_simple_loop: [0.1732664997689426, 0.1601669997908175, 
0.15792609984055161, 0.157397349591, 0.15759290009737015]
using_re_finditer: [0.003412699792534113, 0.0032823001965880394, 
0.0033694999292492867, 0.003354900050908327, 0.006998894810677]
using_simple_loop2: [0.00256159994751215, 0.0025471001863479614, 
0.0025424999184906483, 0.0025831996463239193, 0.002999018251896]


On my system the difference is way bigger than that:

KEY = '''it doesn't matter, but in other cases it will.'''

using_simple_loop2: [0.000495502449548, 0.0004844000213779509, 
0.0004862999776378274, 0.0004800999886356294, 0.0004792999825440347]


using_re_finditer: [0.002840900036972016, 0.002833350251794, 
0.002701299963518977, 0.0028105000383220613, 0.0029977999511174858]


Shorter keys show the least differential:

KEY = 'in'

using_simple_loop2: [0.001983499969355762, 0.0019614999764598906, 
0.0019617999787442386, 0.002027600014116615, 0.0020669000223279]


using_re_finditer: [0.002787900040857494, 0.0027620999608188868, 
0.0027723999810405076, 0.002776700013782829, 0.002946800028439611]


Brilliant!

Python 3.10.9
Windows 10 AMD64 (build 10.0.19044) SP0

--
https://mail.python.org/mailman/listinfo/python-list


RE: Re: Python 3.10 Fizzbuzz

2023-02-28 Thread avi.e.gross
Karsten,

Would it be OK if we paused this discussion a day till February is History?

Sarcasm aside, I repeat, the word black has many unrelated meanings as
presumably this case includes. And for those who do not keep close track of
the local US nonsense, February has for some reason been dedicated to be a
National Black History Month.

Can software violate a code for human conduct? The recent AI news suggests
it does! LOL!

But you know, if you hire a program to tell you if your code passes a
designated series of tests and it just points out where they did not, and
suggest changes that may put you in alignment, that by itself is not
abusive. But if you did not ask for their opinion, yes, it can be annoying
as being unsolicited.

Humans can be touchy and lose context. I have people in my life who
magically ignore my carefully thought-out phrases like "If ..." by acting as
if I had said something rather than IF something. Worse, they hear
abstractions too concretely. I might be discussing COVID and saying "If
COVID was a lethal as it used to be ..." and they interject BUT IT ISN'T.
OK, listen again. I am abstract and trying to make a point. The fact that
you think it isn't is nice to note but hardly relevant to a WHAT IF
question.

So a program designed by programmers, a few of whom are not well known for
how they interact with humans but who nonetheless insist on designed user
interfaces by themselves, may well come across negatively. The reality is
humans vary tremendously and one may appreciate feedback as a way to improve
and get out of the red and the other will assume it is a put down that
leaves them black and blue, even when the words are the same.

-Original Message-
From: Python-list  On
Behalf Of Karsten Hilbert
Sent: Tuesday, February 28, 2023 2:44 PM
To: pythonl...@danceswithmice.info
Cc: python-list@python.org
Subject: Aw: Re: Python 3.10 Fizzbuzz

> > I've never tried Black or any other code formatter, but I'm sure we 
> > wouldn't get on.
>
> Does this suggest, that because Black doesn't respect other people's 
> opinions and feelings, that it wouldn't meet the PSF's Code of Conduct?

That much depends on The Measure Of A Man.

Karsten
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: How to escape strings for re.finditer?

2023-02-28 Thread avi.e.gross
David,

Your results suggest we need to be reminded that lots depends on other
factors. There are multiple versions/implementations of python out there
including some written in C but also other underpinnings. Each can often
have sections of pure python code replaced carefully with libraries of
compiled code, or not. So your results will vary.

Just as an example, assume you derive a type of your own as a subclass of
str and you over-ride the find method by writing it in pure python using
loops and maybe add a few bells and whistles. If you used your improved
algorithm using this variant of str, might it not be quite a bit slower?
Imagine how much slower if your improvement also implemented caching and
logging and the option of ignoring case which are not really needed here.

This type of thing can happen in many other scenarios and some module may be
shared that is slow and a while later is updated but not everyone installs
the update so performance stats can vary wildly. 

Some people advocate using some functional programming tactics, in various
languages, partially because the more general loops are SLOW. But that is
largely because some of the functional stuff is a compiled function that
hides the loops inside a faster environment than the interpreter.

-Original Message-
From: Python-list  On
Behalf Of David Raymond
Sent: Tuesday, February 28, 2023 2:40 PM
To: python-list@python.org
Subject: RE: How to escape strings for re.finditer?

> I wrote my previous message before reading this.  Thank you for the test
you ran -- it answers the question of performance.  You show that
re.finditer is 30x faster, so that certainly recommends that over a simple
loop, which introduces looping overhead.  

>>      def using_simple_loop(key, text):
>>      matches = []
>>      for i in range(len(text)):
>>      if text[i:].startswith(key):
>>      matches.append((i, i + len(key)))
>>      return matches
>>
>>      using_simple_loop: [0.1395295020792, 0.1306313000456,
0.1280345001249, 0.1318618002423, 0.1308461032626]
>>      using_re_finditer: [0.00386140005233, 0.00406190124297,
0.00347899970256, 0.00341310216218, 0.003732001273]


With a slight tweak to the simple loop code using .find() it becomes a third
faster than the RE version though.


def using_simple_loop2(key, text):
matches = []
keyLen = len(key)
start = 0
while (foundSpot := text.find(key, start)) > -1:
start = foundSpot + keyLen
matches.append((foundSpot, start))
return matches


using_simple_loop: [0.1732664997689426, 0.1601669997908175,
0.15792609984055161, 0.157397349591, 0.15759290009737015]
using_re_finditer: [0.003412699792534113, 0.0032823001965880394,
0.0033694999292492867, 0.003354900050908327, 0.006998894810677]
using_simple_loop2: [0.00256159994751215, 0.0025471001863479614,
0.0025424999184906483, 0.0025831996463239193, 0.002999018251896]
-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Python 3.10 Fizzbuzz

2023-02-28 Thread avi.e.gross
Dave, 

Is it rude to name something "black" to make it hard for some of us to remind 
them of the rules or claim that our personal style is so often the opposite 
that it should be called "white" or at least shade of gray?

The usual kidding aside, I have no idea what it was called black but in all 
seriousness this is not a black and white issue. Opinions may differ when a 
language provides many valid options on how to write code. If someone wants to 
standardize and impose some decisions, fine. But other may choose their own 
variant and take their chances.

I, for example, like certain features in many languages where if I am only 
doing one short line of code, I prefer to skip the fanfare. Consider an 
(non-python)

If (condition) {
print(5)
}

Who needs that nonsense? If the language allows it:

If (condition) print(5)

Or in python:

If condition: print(5)

Rather than a multi-line version.

But will I always use the short version? Nope. If I expect to add code later, 
might as well start with the multi-line form. If the line gets too long, ditto. 
And, quite importantly, if editing other people's code, I look around and 
follow their lead.

There often is no (one) right way to do things but there often are many wrong 
ways. Tools like black (which I know nothing detailed about) can be helpful. 
But I have experience times when I wrote carefully crafted code (as it happens 
in R inside the RSTUDIO editor) and selected a region and asked it to reformat, 
and gasped at how it ruined my neatly arranged code. I just wanted the bit I 
had added to be formatted the same as the rest already was, not a complete 
re-make. Luckily, there is an undo. 

There must be some parameterized tools out there that let you set up a profile 
of your own personal preferences that help keep your code in your own preferred 
format, and re-arrange it after you have done some editing like copying from 
somewhere else so it fits together.

-Original Message-
From: Python-list  On 
Behalf Of dn via Python-list
Sent: Tuesday, February 28, 2023 2:22 PM
To: python-list@python.org
Subject: Re: Python 3.10 Fizzbuzz

On 28/02/2023 12.55, Rob Cliffe via Python-list wrote:
> 
> 
> On 27/02/2023 21:04, Ethan Furman wrote:
>> On 2/27/23 12:20, rbowman wrote:
>>
>> > "By using Black, you agree to cede control over minutiae of hand- 
>> > formatting. In return, Black gives you speed, determinism, and 
>> > freedom from pycodestyle nagging about formatting. You will save 
>> > time and
>> mental
>> > energy for more important matters."
>> >
>> > Somehow I don't think we would get along very well. I'm a little on 
>> > the opinionated side myself.
>>
>> I personally cannot stand Black.  It feels like every major choice it 
>> makes (and some minor ones) are exactly the opposite of the choice I 
>> make.
>>
>> --
>> ~Ethan~
> I've never tried Black or any other code formatter, but I'm sure we 
> wouldn't get on.

Does this suggest, that because Black doesn't respect other people's opinions 
and feelings, that it wouldn't meet the PSF's Code of Conduct?

--
Regards,
=dn
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 3.10 Fizzbuzz

2023-02-28 Thread Mats Wichmann

On 2/27/23 16:42, Oscar Benjamin wrote:

On Mon, 27 Feb 2023 at 21:06, Ethan Furman  wrote:


On 2/27/23 12:20, rbowman wrote:

  > "By using Black, you agree to cede control over minutiae of hand-
  > formatting. In return, Black gives you speed, determinism, and freedom
  > from pycodestyle nagging about formatting. You will save time and mental
  > energy for more important matters."
  >
  > Somehow I don't think we would get along very well. I'm a little on the
  > opinionated side myself.

I personally cannot stand Black.  It feels like every major choice it makes 
(and some minor ones) are exactly the
opposite of the choice I make.


I agree partially. There are two types of decisions black makes:

1. Leave the code alone because it seems okay or make small modifications.
2. Reformat the code because it violates some generic rule (like line
too long or something).

I've recently tried Black and mostly for my code it seems to go with 1
(code looks okay). There might be some minor changes like double vs
single quotes but I really don't care about those. In that sense me
and Black seem to agree.

However I have also reviewed code where it is clear that the author
has used black and their code came under case 2. In that case Black
seems to produce awful things. What I can't understand is someone
accepting the awful rewrite rather than just fixing the code. Treating
Black almost like a linter makes sense to me but accepting the
rewrites that it offers for bad code does not.


The amount of disagreement you see here and elsewhere are exactly why 
Black  is like it is - virtually without options.  It doesn't aim to 
solve the challenge of producing The Most Beautiful Code Layout, for 
*you*, or even for any moderately sized group of programmers.  Instead 
it's to remove the bickering:

1. we choose to use black for a project.
2. black runs automatically
3. there is now no need to spend cycles thinking about code-style 
aspects in reviews, or when we're making changes, because black makes 
sure the code aligns with the chosen style (1).


Many teams find the removal of this potential disagreement valuable - 
there's plenty of more important stuff to spend time on. If as an 
individual user, not trying to conform to style choices of a project, it 
doesn't appeal, there's no need to fuss with it.


One can certainly pick a different code style, and make sure it's 
captured in the rules for one of the several more flexible formatting 
tools (for example, I *used* to use yapf pretty regularly, and had that 
tuned as I wanted)

--
https://mail.python.org/mailman/listinfo/python-list


RE: How to escape strings for re.finditer?

2023-02-28 Thread avi.e.gross
This message is more for Thomas than Jen,

You made me think of what happens in fairly large cases. What happens if I ask 
you to search a thousand pages looking for your name? 

One solution might be to break the problem into parts that can be run in 
independent threads or processes and perhaps across different CPU's or on many 
machines at once. Think of it as a variant on a merge sort where each chunk 
returns where it found one or more items and then those are gathered together 
and merged upstream.

The problem is you cannot just randomly divide the text.  Any matches across a 
divide are lost. So if you know you are searching for "Thomas Passin" you need 
an overlap big enough to hold enough of that size. It would not be made as 
something like a pure binary tree and if the choices made included variant 
sizes in what might match, you would get duplicates. So the merging part would 
obviously have to eventually remove those.

I have often wondered how Google and other such services are able to find 
millions of things in hardly any time and arguably never show most of them as 
who looks past a few pages/screens?

I think much of that may involve other techniques including quite a bit of 
pre-indexing. But they also seem to enlist lots of processors that each do the 
search on a subset of the problem space and combine and prioritize.

-Original Message-
From: Python-list  On 
Behalf Of Thomas Passin
Sent: Tuesday, February 28, 2023 1:31 PM
To: python-list@python.org
Subject: Re: How to escape strings for re.finditer?

On 2/28/2023 1:07 PM, Jen Kris wrote:
> 
> Using str.startswith is a cool idea in this case.  But is it better 
> than regex for performance or reliability?  Regex syntax is not a 
> model of simplicity, but in my simple case it's not too difficult.

The trouble is that we don't know what your case really is.  If you are talking 
about a short pattern like your example and a small text to search, and you 
don't need to do it too often, then my little code example is probably ideal. 
Reliability wouldn't be an issue, and performance would not be relevant.  If 
your case is going to be much larger, called many times in a loop, or be much 
more complicated in some other way, then a regex or some other approach is 
likely to be much faster.


> Feb 27, 2023, 18:52 by li...@tompassin.net:
> 
> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
> 
> And, just for fun, since there is nothing wrong with your code,
> this minor change is terser:
> 
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> for match in re.finditer(re.escape('abc_degree + 1')
> , example):
> 
> ... print(match.start(), match.end())
> ...
> ...
> 4 18
> 26 40
> 
> 
> Just for more fun :) -
> 
> Without knowing how general your expressions will be, I think the
> following version is very readable, certainly more readable than
> regexes:
> 
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> KEY = 'abc_degree + 1'
> 
> for i in range(len(example)):
> if example[i:].startswith(KEY):
> print(i, i + len(KEY))
> # prints:
> 4 18
> 26 40
> 
> If you may have variable numbers of spaces around the symbols, OTOH,
> the whole situation changes and then regexes would almost certainly
> be the best approach. But the regular expression strings would
> become harder to read.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 
> 

--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: How to escape strings for re.finditer?

2023-02-28 Thread avi.e.gross
Jen,

 

I had no doubt the code you ran was indented properly or it would not work.

 

I am merely letting you know that somewhere in the process of copying the code 
or the transition between mailers, my version is messed up. It happens to be 
easy for me to fix but I sometimes see garbled code I then simply ignore.

 

At times what may help is to leave blank lines that python ignores but also 
keeps the line rearrangements minimal.

 

On to your real question.

 

In my OPINION, there are many interesting questions that can get in the way of 
just getting a working solution. Some may be better in some abstract way but 
except for big projects it often hardly matters.

 

So regex is one thing or more a cluster of things and a list comp is something 
completely different. They are both tools you can use and abuse or lose.

 

The distinction I believe we started with was how to find a fixed string inside 
another fixed string in as many places as needed and perhaps return offset 
info. So this can be solved in too many ways using a side of python focused on 
pure text. As discussed, solutions can include explicit loops such as “for” and 
“while” and their syntactic sugar cousin of a list comp. Not mentioned yet are 
other techniques like a recursive function that finds the first and passes on 
the rest of the string to itself to find the rest, or various functional 
programming techniques that may do sort of hidden loops. YOU DO NOT NEED ALL OF 
THEM but it can be interesting to learn.

 

Regex is a completely different universe that is a bit more of MORE. If I ask 
you for a ride to the grocery store, I might expect you to show up with a car 
and not a James Bond vehicle that also is a boat, submarine, airplane, and 
maybe spaceship. Well, Regex is the latter. And in your case, it is this 
complexity that meant you had to convert your text so it will not see what it 
considers commands or hints.

 

In normal use, put a bit too simply, it wants a carefully crafted pattern to be 
spelled out and it weaves an often complex algorithm it then sort of compiles 
that represents the understanding of what you asked for. The simplest pattern 
is to match EXACTLY THIS. That is your case.

 

A more complex pattern may say to match Boston OR Chicago followed by any 
amount of whitespace then a number of digits between 3 and 5 and then should 
not be followed by something specific. Oh, and by the way, save selected parts 
in parentheses to be accessed as \1 or \2 so I can ask you to do things like 
match a word followed by itself. It goes on and on. 

 

Be warned RE is implemented now all over the place including outside the usual 
UNIX roots and there are somewhat different versions. For your need, it does 
not matter.

 

The compiled monstrosity though can be fairly fast and might be a tad hard for 
you to write by yourself as a bunch of if statements nested that are  weirdly 
matching various patterns with some look ahead or look behind. 

 

What you are being told is that despite this being way more than you asked for, 
it not only works but is fairly fast when doing the simple thing you asked for. 
That may be why a text version you are looking for is hard to find.

 

I am not clear what exactly the rest of your project is about but my guess is 
your first priority is completing it decently and not to try umpteen methods 
and compare them. Not today. Of course if the working version is slow and you 
profile it and find this part seems to be holding it back, it may be worth 
examining.

 

 

From: Jen Kris  
Sent: Tuesday, February 28, 2023 12:58 PM
To: avi.e.gr...@gmail.com
Cc: 'Python List' 
Subject: RE: How to escape strings for re.finditer?

 

The code I sent is correct, and it runs here.  Maybe you received it with a 
carriage return removed, but on my copy after posting, it is correct:

 

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

 find_string = re.escape('abc_degree + 1')

 for match in re.finditer(find_string, example):

 print(match.start(), match.end())

 

One question:  several people have made suggestions other than regex (not your 
terser example with regex you shown below).  Is there a reason why regex is not 
preferred to, for example, a list comp?  Performance?  Reliability?  

 

 

 

  

 

 

Feb 27, 2023, 18:16 by avi.e.gr...@gmail.com  :

Jen,

 

Can you see what SOME OF US see as ASCII text? We can help you better if we get 
code that can be copied and run as-is.

 

What you sent is not terse. It is wrong. It will not run on any python 
interpreter because you somehow lost a carriage return and indent.

 

This is what you sent:

 

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, 
example):

print(match.start(), match.end())

 

This is code indentedproperly:

 

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

find_string = 

Aw: Re: Python 3.10 Fizzbuzz

2023-02-28 Thread Karsten Hilbert
> > I've never tried Black or any other code formatter, but I'm sure we
> > wouldn't get on.
>
> Does this suggest, that because Black doesn't respect other people's
> opinions and feelings, that it wouldn't meet the PSF's Code of Conduct?

That much depends on The Measure Of A Man.

Karsten
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: How to escape strings for re.finditer?

2023-02-28 Thread David Raymond
> I wrote my previous message before reading this.  Thank you for the test you 
> ran -- it answers the question of performance.  You show that re.finditer is 
> 30x faster, so that certainly recommends that over a simple loop, which 
> introduces looping overhead.  

>>      def using_simple_loop(key, text):
>>      matches = []
>>      for i in range(len(text)):
>>      if text[i:].startswith(key):
>>      matches.append((i, i + len(key)))
>>      return matches
>>
>>      using_simple_loop: [0.1395295020792, 0.1306313000456, 
>> 0.1280345001249, 0.1318618002423, 0.1308461032626]
>>      using_re_finditer: [0.00386140005233, 0.00406190124297, 
>> 0.00347899970256, 0.00341310216218, 0.003732001273]


With a slight tweak to the simple loop code using .find() it becomes a third 
faster than the RE version though.


def using_simple_loop2(key, text):
matches = []
keyLen = len(key)
start = 0
while (foundSpot := text.find(key, start)) > -1:
start = foundSpot + keyLen
matches.append((foundSpot, start))
return matches


using_simple_loop: [0.1732664997689426, 0.1601669997908175, 
0.15792609984055161, 0.157397349591, 0.15759290009737015]
using_re_finditer: [0.003412699792534113, 0.0032823001965880394, 
0.0033694999292492867, 0.003354900050908327, 0.006998894810677]
using_simple_loop2: [0.00256159994751215, 0.0025471001863479614, 
0.0025424999184906483, 0.0025831996463239193, 0.002999018251896]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 3.10 Fizzbuzz

2023-02-28 Thread Chris Angelico
On Wed, 1 Mar 2023 at 06:25, dn via Python-list  wrote:
>
> On 28/02/2023 12.55, Rob Cliffe via Python-list wrote:
> >
> >
> > On 27/02/2023 21:04, Ethan Furman wrote:
> >> On 2/27/23 12:20, rbowman wrote:
> >>
> >> > "By using Black, you agree to cede control over minutiae of hand-
> >> > formatting. In return, Black gives you speed, determinism, and freedom
> >> > from pycodestyle nagging about formatting. You will save time and
> >> mental
> >> > energy for more important matters."
> >> >
> >> > Somehow I don't think we would get along very well. I'm a little on the
> >> > opinionated side myself.
> >>
> >> I personally cannot stand Black.  It feels like every major choice it
> >> makes (and some minor ones) are exactly the opposite of the choice I
> >> make.
> >>
> >> --
> >> ~Ethan~
> > I've never tried Black or any other code formatter, but I'm sure we
> > wouldn't get on.
>
> Does this suggest, that because Black doesn't respect other people's
> opinions and feelings, that it wouldn't meet the PSF's Code of Conduct?
>

Yes, so if Black ever posts on this list, it will probably get banned...

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 3.10 Fizzbuzz

2023-02-28 Thread dn via Python-list

On 28/02/2023 12.55, Rob Cliffe via Python-list wrote:



On 27/02/2023 21:04, Ethan Furman wrote:

On 2/27/23 12:20, rbowman wrote:

> "By using Black, you agree to cede control over minutiae of hand-
> formatting. In return, Black gives you speed, determinism, and freedom
> from pycodestyle nagging about formatting. You will save time and 
mental

> energy for more important matters."
>
> Somehow I don't think we would get along very well. I'm a little on the
> opinionated side myself.

I personally cannot stand Black.  It feels like every major choice it 
makes (and some minor ones) are exactly the opposite of the choice I 
make.


--
~Ethan~
I've never tried Black or any other code formatter, but I'm sure we 
wouldn't get on.


Does this suggest, that because Black doesn't respect other people's 
opinions and feelings, that it wouldn't meet the PSF's Code of Conduct?


--
Regards,
=dn
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin

On 2/28/2023 11:48 AM, Jon Ribbens via Python-list wrote:

On 2023-02-28, Thomas Passin  wrote:

...


It is interesting, though, how pre-processing the search pattern can
improve search times if you can afford the pre-processing.  Here's a
paper on rapidly finding matches when there may be up to one misspelled
character.  It's easy enough to implement, though in Python you can't
take the additional step of tuning it to stay in cache.

https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf


You've somehow title-cased that URL. The correct URL is:

https://robert.muth.org/Papers/1996-approx-multi.pdf


Thanks, not sure how that happened ...

--
https://mail.python.org/mailman/listinfo/python-list


ANN: DIPY 1.6.0

2023-02-28 Thread Eleftherios Garyfallidis
Hello all,


We are excited to announce a new release of DIPY: DIPY 1.6.0 is out from
the oven!


In addition, registration for the oceanic DIPY workshop 2023 (April 24-28)
is now open! Our comprehensive program is designed to equip you with the
skills and knowledge needed to master the latest techniques and tools in
structural and diffusion imaging.  An intense hands-on experience in Santa
Monica, Los Angeles! See the exquisite program here
 for this highly
anticipated event .



DIPY 1.6.0 (Monday, 16 January 2023)

The release 1.6.0 received contributions from 22 developers (the full
release notes are at:
https://dipy.org/documentation/1.6.0./release_notes/release1.6/).


Thank you all for your contributions and feedback!


Please click here
 to
check 1.6.0 API changes.


Highlights of 1.6.0 release include:

   -

   NF: Unbiased groupwise linear bundle registration added.
   -

   NF: MAP+ constraints added.
   -

   Generalized PCA to less than 3 spatial dims.
   -

   Added positivity constraints to QTI.
   -

   New functionality to apply Symmetric Diffeomorphic Registration to
   points/streamlines.
   -

   New Human Connectome Project (HCP) data fetcher added.
   -

   New Healthy Brain Network (HBN) data fetcher added.
   -

   Multiple Workflows updated (DTIFlow, LPCAFlow, MPPCA) and added
   (RUMBAFlow).
   -

   Ability to handle VTP files.
   -

   Large codebase cleaning.
   -

   Large documentation update.
   -

   Closed 75 issues and merged 41 pull requests.

To upgrade or install  DIPY


Run the following command in your terminal:



pip install --upgrade dipy


or


conda install -c conda-forge dipy


This version of DIPY depends on nibabel (3.0.0+).

For visualization you need FURY (0.8.0+).


Please support us by citing DIPY in your papers using the following
DOI: 10.3389/fninf.2014.8



Questions or suggestions?



For any questions go to https://dipy.org, or send an e-mail to
d...@python.org 

We also have an instant messaging service and chat room available at
https://gitter.im/dipy/dipy

Finally, a new forum is available at
https://github.com/dipy/dipy/discussions


Have a wonderful time using the new version.


On behalf of the DIPY developers,

Eleftherios Garyfallidis, Ariel Rokem, Serge Koudoro

https://dipy.org/contributors
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin

On 2/28/2023 12:57 PM, Jen Kris via Python-list wrote:

The code I sent is correct, and it runs here.  Maybe you received it with a 
carriage return removed, but on my copy after posting, it is correct:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
  find_string = re.escape('abc_degree + 1')
  for match in re.finditer(find_string, example):
  print(match.start(), match.end())

One question:  several people have made suggestions other than regex (not your 
terser example with regex you shown below).  Is there a reason why regex is not 
preferred to, for example, a list comp?  Performance?  Reliability?


"Some people, when confronted with a problem, think 'I know, I'll use 
regular expressions.' Now they have two problems."


- 
https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/


Of course, if you actually read the blog post in the link, there's more 
to it than that...




Feb 27, 2023, 18:16 by avi.e.gr...@gmail.com:


Jen,

Can you see what SOME OF US see as ASCII text? We can help you better if we get 
code that can be copied and run as-is.

  What you sent is not terse. It is wrong. It will not run on any python 
interpreter because you somehow lost a carriage return and indent.

This is what you sent:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, 
example):
  print(match.start(), match.end())

This is code indentedproperly:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1')
for match in re.finditer(find_string, example):
  print(match.start(), match.end())

Of course I am sure you wrote and ran code more like the latter version but 
somewhere in your copy/paste process, 

And, just for fun, since there is nothing wrong with your code, this minor 
change is terser:


example = 'X - abc_degree + 1 + qq + abc_degree + 1'
for match in re.finditer(re.escape('abc_degree + 1') , example):


... print(match.start(), match.end())
...
...
4 18
26 40

But note once you use regular expressions, and not in your case, you might match multiple things 
that are far from the same such as matching two repeated words of any kind in any case including 
"and and" and "so so" or finding words that have multiple doubled letter as in 
the  stereotypical bookkeeper. In those cases, you may want even more than offsets but also show 
the exact text that matched or even show some characters before and/or after for context.


-Original Message-
From: Python-list  On 
Behalf Of Jen Kris via Python-list
Sent: Monday, February 27, 2023 8:36 PM
To: Cameron Simpson 
Cc: Python List 
Subject: Re: How to escape strings for re.finditer?


I haven't tested it either but it looks like it would work.  But for this case 
I prefer the relative simplicity of:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, 
example):
  print(match.start(), match.end())

4 18
26 40

I don't insist on terseness for its own sake, but it's cleaner this way.

Jen


Feb 27, 2023, 16:55 by c...@cskk.id.au:


On 28Feb2023 01:13, Jen Kris  wrote:


I went to the re module because the specified string may appear more than once 
in the string (in the code I'm writing).



Sure, but writing a `finditer` for plain `str` is pretty easy (untested):

  pos = 0
  while True:
  found = s.find(substring, pos)
  if found < 0:
  break
  start = found
  end = found + len(substring)
  ... do whatever with start and end ...
  pos = end

Many people go straight to the `re` module whenever they're looking for 
strings. It is often cryptic error prone overkill. Just something to keep in 
mind.

Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list



--
https://mail.python.org/mailman/listinfo/python-list





--
https://mail.python.org/mailman/listinfo/python-list


RE: How to escape strings for re.finditer?

2023-02-28 Thread avi.e.gross
Roel,

You make some good points. One to consider is that when you ask a regular 
expression matcher to search using something that uses NO regular expression 
features, much of the complexity disappears and what it creates is probably 
similar enough to what you get with a string search except that loops and all 
are written as something using fast functions probably written in C. 

That is one reason the roll your own versions have a disadvantage unless you 
roll your own in a similar way by writing a similar C function.

Nobody has shown us what really should be out there of a simple but fast text 
search algorithm that does a similar job and it may still be out there, but as 
you point out, perhaps it is not needed as long as people just use the re 
version.

Avi

-Original Message-
From: Python-list  On 
Behalf Of Roel Schroeven
Sent: Tuesday, February 28, 2023 4:33 AM
To: python-list@python.org
Subject: Re: How to escape strings for re.finditer?

Op 28/02/2023 om 3:44 schreef Thomas Passin:
> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
>> And, just for fun, since there is nothing wrong with your code, this 
>> minor change is terser:
>>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> for match in re.finditer(re.escape('abc_degree + 1') , example):
>> ... print(match.start(), match.end()) ...
>> ...
>> 4 18
>> 26 40
>
> Just for more fun :) -
>
> Without knowing how general your expressions will be, I think the 
> following version is very readable, certainly more readable than regexes:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> KEY = 'abc_degree + 1'
>
> for i in range(len(example)):
> if example[i:].startswith(KEY):
> print(i, i + len(KEY))
> # prints:
> 4 18
> 26 40
I think it's often a good idea to use a standard library function instead of 
rolling your own. The issue becomes less clear-cut when the standard library 
doesn't do exactly what you need (as here, where
re.finditer() uses regular expressions while the use case only uses simple 
search strings). Ideally there would be a str.finditer() method we could use, 
but in the absence of that I think we still need to consider using the 
almost-but-not-quite fitting re.finditer().

Two reasons:

(1) I think it's clearer: the name tells us what it does (though of course we 
could solve this in a hand-written version by wrapping it in a suitably named 
function).

(2) Searching for a string in another string, in a performant way, is not as 
simple as it first appears. Your version works correctly, but slowly. In some 
situations it doesn't matter, but in other cases it will. For better 
performance, string searching algorithms jump ahead either when they found a 
match or when they know for sure there isn't a match for some time (see e.g. 
the Boyer–Moore string-search algorithm). 
You could write such a more efficient algorithm, but then it becomes more 
complex and more error-prone. Using a well-tested existing function becomes 
quite attractive.

To illustrate the difference performance, I did a simple test (using the 
paragraph above is test text):

 import re
 import timeit

 def using_re_finditer(key, text):
 matches = []
 for match in re.finditer(re.escape(key), text):
 matches.append((match.start(), match.end()))
 return matches


 def using_simple_loop(key, text):
 matches = []
 for i in range(len(text)):
 if text[i:].startswith(key):
 matches.append((i, i + len(key)))
 return matches


 CORPUS = """Searching for a string in another string, in a performant way, 
is
 not as simple as it first appears. Your version works correctly, but 
slowly.
 In some situations it doesn't matter, but in other cases it will. 
For better
 performance, string searching algorithms jump ahead either when they found 
a
 match or when they know for sure there isn't a match for some time (see 
e.g.
 the Boyer–Moore string-search algorithm). You could write such a more
 efficient algorithm, but then it becomes more complex and more error-prone.
 Using a well-tested existing function becomes quite attractive."""
 KEY = 'in'
 print('using_simple_loop:',
timeit.repeat(stmt='using_simple_loop(KEY, CORPUS)', globals=globals(),
number=1000))
 print('using_re_finditer:',
timeit.repeat(stmt='using_re_finditer(KEY, CORPUS)', globals=globals(),
number=1000))

This does 5 runs of 1000 repetitions each, and reports the time in seconds for 
each of those runs.
Result on my machine:

 using_simple_loop: [0.1395295020792, 0.1306313000456, 
0.1280345001249, 0.1318618002423, 0.1308461032626]
 using_re_finditer: [0.00386140005233, 0.00406190124297, 
0.00347899970256, 0.00341310216218, 0.003732001273]

We find that in this test re.finditer() is more than 30 times faster (despite 
the overhead of regular expressions.

While 

Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin

On 2/28/2023 1:07 PM, Jen Kris wrote:


Using str.startswith is a cool idea in this case.  But is it better than 
regex for performance or reliability?  Regex syntax is not a model of 
simplicity, but in my simple case it's not too difficult.


The trouble is that we don't know what your case really is.  If you are 
talking about a short pattern like your example and a small text to 
search, and you don't need to do it too often, then my little code 
example is probably ideal. Reliability wouldn't be an issue, and 
performance would not be relevant.  If your case is going to be much 
larger, called many times in a loop, or be much more complicated in some 
other way, then a regex or some other approach is likely to be much faster.




Feb 27, 2023, 18:52 by li...@tompassin.net:

On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:

And, just for fun, since there is nothing wrong with your code,
this minor change is terser:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
for match in re.finditer(re.escape('abc_degree + 1')
, example):

... print(match.start(), match.end())
...
...
4 18
26 40


Just for more fun :) -

Without knowing how general your expressions will be, I think the
following version is very readable, certainly more readable than
regexes:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
KEY = 'abc_degree + 1'

for i in range(len(example)):
if example[i:].startswith(KEY):
print(i, i + len(KEY))
# prints:
4 18
26 40

If you may have variable numbers of spaces around the symbols, OTOH,
the whole situation changes and then regexes would almost certainly
be the best approach. But the regular expression strings would
become harder to read.
-- 
https://mail.python.org/mailman/listinfo/python-list





--
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Jon Ribbens via Python-list
On 2023-02-28, Thomas Passin  wrote:
> On 2/28/2023 10:05 AM, Roel Schroeven wrote:
>> Op 28/02/2023 om 14:35 schreef Thomas Passin:
>>> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
 [...]
 (2) Searching for a string in another string, in a performant way, is 
 not as simple as it first appears. Your version works correctly, but 
 slowly. In some situations it doesn't matter, but in other cases it 
 will. For better performance, string searching algorithms jump ahead 
 either when they found a match or when they know for sure there isn't 
 a match for some time (see e.g. the Boyer–Moore string-search 
 algorithm). You could write such a more efficient algorithm, but then 
 it becomes more complex and more error-prone. Using a well-tested 
 existing function becomes quite attractive.
>>>
>>> Sure, it all depends on what the real task will be.  That's why I 
>>> wrote "Without knowing how general your expressions will be". For the 
>>> example string, it's unlikely that speed will be a factor, but who 
>>> knows what target strings and keys will turn up in the future?
>> On hindsight I think it was overthinking things a bit. "It all depends 
>> on what the real task will be" you say, and indeed I think that should 
>> be the main conclusion here.
>
> It is interesting, though, how pre-processing the search pattern can 
> improve search times if you can afford the pre-processing.  Here's a 
> paper on rapidly finding matches when there may be up to one misspelled 
> character.  It's easy enough to implement, though in Python you can't 
> take the additional step of tuning it to stay in cache.
>
> https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf

You've somehow title-cased that URL. The correct URL is:

https://robert.muth.org/Papers/1996-approx-multi.pdf
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list

I wrote my previous message before reading this.  Thank you for the test you 
ran -- it answers the question of performance.  You show that re.finditer is 
30x faster, so that certainly recommends that over a simple loop, which 
introduces looping overhead.  


Feb 28, 2023, 05:44 by li...@tompassin.net:

> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
>
>> Op 28/02/2023 om 3:44 schreef Thomas Passin:
>>
>>> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
>>>
 And, just for fun, since there is nothing wrong with your code, this minor 
 change is terser:

>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>>>
 ... print(match.start(), match.end())
 ...
 ...
 4 18
 26 40

>>>
>>> Just for more fun :) -
>>>
>>> Without knowing how general your expressions will be, I think the following 
>>> version is very readable, certainly more readable than regexes:
>>>
>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>> KEY = 'abc_degree + 1'
>>>
>>> for i in range(len(example)):
>>>     if example[i:].startswith(KEY):
>>>     print(i, i + len(KEY))
>>> # prints:
>>> 4 18
>>> 26 40
>>>
>> I think it's often a good idea to use a standard library function instead of 
>> rolling your own. The issue becomes less clear-cut when the standard library 
>> doesn't do exactly what you need (as here, where re.finditer() uses regular 
>> expressions while the use case only uses simple search strings). Ideally 
>> there would be a str.finditer() method we could use, but in the absence of 
>> that I think we still need to consider using the almost-but-not-quite 
>> fitting re.finditer().
>>
>> Two reasons:
>>
>> (1) I think it's clearer: the name tells us what it does (though of course 
>> we could solve this in a hand-written version by wrapping it in a suitably 
>> named function).
>>
>> (2) Searching for a string in another string, in a performant way, is not as 
>> simple as it first appears. Your version works correctly, but slowly. In 
>> some situations it doesn't matter, but in other cases it will. For better 
>> performance, string searching algorithms jump ahead either when they found a 
>> match or when they know for sure there isn't a match for some time (see e.g. 
>> the Boyer–Moore string-search algorithm). You could write such a more 
>> efficient algorithm, but then it becomes more complex and more error-prone. 
>> Using a well-tested existing function becomes quite attractive.
>>
>
> Sure, it all depends on what the real task will be.  That's why I wrote 
> "Without knowing how general your expressions will be". For the example 
> string, it's unlikely that speed will be a factor, but who knows what target 
> strings and keys will turn up in the future?
>
>> To illustrate the difference performance, I did a simple test (using the 
>> paragraph above is test text):
>>
>>      import re
>>      import timeit
>>
>>      def using_re_finditer(key, text):
>>      matches = []
>>      for match in re.finditer(re.escape(key), text):
>>      matches.append((match.start(), match.end()))
>>      return matches
>>
>>
>>      def using_simple_loop(key, text):
>>      matches = []
>>      for i in range(len(text)):
>>      if text[i:].startswith(key):
>>      matches.append((i, i + len(key)))
>>      return matches
>>
>>
>>      CORPUS = """Searching for a string in another string, in a performant 
>> way, is
>>      not as simple as it first appears. Your version works correctly, but 
>> slowly.
>>      In some situations it doesn't matter, but in other cases it will. For 
>> better
>>      performance, string searching algorithms jump ahead either when they 
>> found a
>>      match or when they know for sure there isn't a match for some time (see 
>> e.g.
>>      the Boyer–Moore string-search algorithm). You could write such a more
>>      efficient algorithm, but then it becomes more complex and more 
>> error-prone.
>>      Using a well-tested existing function becomes quite attractive."""
>>      KEY = 'in'
>>      print('using_simple_loop:', timeit.repeat(stmt='using_simple_loop(KEY, 
>> CORPUS)', globals=globals(), number=1000))
>>      print('using_re_finditer:', timeit.repeat(stmt='using_re_finditer(KEY, 
>> CORPUS)', globals=globals(), number=1000))
>>
>> This does 5 runs of 1000 repetitions each, and reports the time in seconds 
>> for each of those runs.
>> Result on my machine:
>>
>>      using_simple_loop: [0.1395295020792, 0.1306313000456, 
>> 0.1280345001249, 0.1318618002423, 0.1308461032626]
>>      using_re_finditer: [0.00386140005233, 0.00406190124297, 
>> 0.00347899970256, 0.00341310216218, 0.003732001273]
>>
>> We find that in this test re.finditer() is more than 30 times faster 
>> (despite the overhead of regular expressions.
>>
>> While speed isn't everything in programming, 

Re: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list

Using str.startswith is a cool idea in this case.  But is it better than regex 
for performance or reliability?  Regex syntax is not a model of simplicity, but 
in my simple case it's not too difficult.  


Feb 27, 2023, 18:52 by li...@tompassin.net:

> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
>
>> And, just for fun, since there is nothing wrong with your code, this minor 
>> change is terser:
>>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> for match in re.finditer(re.escape('abc_degree + 1') , example):
>
>> ... print(match.start(), match.end())
>> ...
>> ...
>> 4 18
>> 26 40
>>
>
> Just for more fun :) -
>
> Without knowing how general your expressions will be, I think the following 
> version is very readable, certainly more readable than regexes:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> KEY = 'abc_degree + 1'
>
> for i in range(len(example)):
>  if example[i:].startswith(KEY):
>  print(i, i + len(KEY))
> # prints:
> 4 18
> 26 40
>
> If you may have variable numbers of spaces around the symbols, OTOH, the 
> whole situation changes and then regexes would almost certainly be the best 
> approach.  But the regular expression strings would become harder to read.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list
The code I sent is correct, and it runs here.  Maybe you received it with a 
carriage return removed, but on my copy after posting, it is correct:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
 find_string = re.escape('abc_degree + 1')
 for match in re.finditer(find_string, example):
 print(match.start(), match.end())

One question:  several people have made suggestions other than regex (not your 
terser example with regex you shown below).  Is there a reason why regex is not 
preferred to, for example, a list comp?  Performance?  Reliability?  



  


Feb 27, 2023, 18:16 by avi.e.gr...@gmail.com:

> Jen,
>
> Can you see what SOME OF US see as ASCII text? We can help you better if we 
> get code that can be copied and run as-is.
>
>  What you sent is not terse. It is wrong. It will not run on any python 
> interpreter because you somehow lost a carriage return and indent.
>
> This is what you sent:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in 
> re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> This is code indentedproperly:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') 
> for match in re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> Of course I am sure you wrote and ran code more like the latter version but 
> somewhere in your copy/paste process, 
>
> And, just for fun, since there is nothing wrong with your code, this minor 
> change is terser:
>
 example = 'X - abc_degree + 1 + qq + abc_degree + 1'
 for match in re.finditer(re.escape('abc_degree + 1') , example):

> ... print(match.start(), match.end())
> ... 
> ... 
> 4 18
> 26 40
>
> But note once you use regular expressions, and not in your case, you might 
> match multiple things that are far from the same such as matching two 
> repeated words of any kind in any case including "and and" and "so so" or 
> finding words that have multiple doubled letter as in the  stereotypical 
> bookkeeper. In those cases, you may want even more than offsets but also show 
> the exact text that matched or even show some characters before and/or after 
> for context.
>
>
> -Original Message-
> From: Python-list  On 
> Behalf Of Jen Kris via Python-list
> Sent: Monday, February 27, 2023 8:36 PM
> To: Cameron Simpson 
> Cc: Python List 
> Subject: Re: How to escape strings for re.finditer?
>
>
> I haven't tested it either but it looks like it would work.  But for this 
> case I prefer the relative simplicity of:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in 
> re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> 4 18
> 26 40
>
> I don't insist on terseness for its own sake, but it's cleaner this way. 
>
> Jen
>
>
> Feb 27, 2023, 16:55 by c...@cskk.id.au:
>
>> On 28Feb2023 01:13, Jen Kris  wrote:
>>
>>> I went to the re module because the specified string may appear more than 
>>> once in the string (in the code I'm writing).
>>>
>>
>> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>>
>>  pos = 0
>>  while True:
>>  found = s.find(substring, pos)
>>  if found < 0:
>>  break
>>  start = found
>>  end = found + len(substring)
>>  ... do whatever with start and end ...
>>  pos = end
>>
>> Many people go straight to the `re` module whenever they're looking for 
>> strings. It is often cryptic error prone overkill. Just something to keep in 
>> mind.
>>
>> Cheers,
>> Cameron Simpson 
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin

On 2/28/2023 10:05 AM, Roel Schroeven wrote:

Op 28/02/2023 om 14:35 schreef Thomas Passin:

On 2/28/2023 4:33 AM, Roel Schroeven wrote:

[...]
(2) Searching for a string in another string, in a performant way, is 
not as simple as it first appears. Your version works correctly, but 
slowly. In some situations it doesn't matter, but in other cases it 
will. For better performance, string searching algorithms jump ahead 
either when they found a match or when they know for sure there isn't 
a match for some time (see e.g. the Boyer–Moore string-search 
algorithm). You could write such a more efficient algorithm, but then 
it becomes more complex and more error-prone. Using a well-tested 
existing function becomes quite attractive.


Sure, it all depends on what the real task will be.  That's why I 
wrote "Without knowing how general your expressions will be". For the 
example string, it's unlikely that speed will be a factor, but who 
knows what target strings and keys will turn up in the future?
On hindsight I think it was overthinking things a bit. "It all depends 
on what the real task will be" you say, and indeed I think that should 
be the main conclusion here.



It is interesting, though, how pre-processing the search pattern can 
improve search times if you can afford the pre-processing.  Here's a 
paper on rapidly finding matches when there may be up to one misspelled 
character.  It's easy enough to implement, though in Python you can't 
take the additional step of tuning it to stay in cache.


https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Roel Schroeven

Op 28/02/2023 om 14:35 schreef Thomas Passin:

On 2/28/2023 4:33 AM, Roel Schroeven wrote:

[...]
(2) Searching for a string in another string, in a performant way, is 
not as simple as it first appears. Your version works correctly, but 
slowly. In some situations it doesn't matter, but in other cases it 
will. For better performance, string searching algorithms jump ahead 
either when they found a match or when they know for sure there isn't 
a match for some time (see e.g. the Boyer–Moore string-search 
algorithm). You could write such a more efficient algorithm, but then 
it becomes more complex and more error-prone. Using a well-tested 
existing function becomes quite attractive.


Sure, it all depends on what the real task will be.  That's why I 
wrote "Without knowing how general your expressions will be". For the 
example string, it's unlikely that speed will be a factor, but who 
knows what target strings and keys will turn up in the future?
On hindsight I think it was overthinking things a bit. "It all depends 
on what the real task will be" you say, and indeed I think that should 
be the main conclusion here.


--
"Man had always assumed that he was more intelligent than dolphins because
he had achieved so much — the wheel, New York, wars and so on — whilst all
the dolphins had ever done was muck about in the water having a good time.
But conversely, the dolphins had always believed that they were far more
intelligent than man — for precisely the same reasons."
-- Douglas Adams

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin

On 2/28/2023 4:33 AM, Roel Schroeven wrote:

Op 28/02/2023 om 3:44 schreef Thomas Passin:

On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
And, just for fun, since there is nothing wrong with your code, this 
minor change is terser:



example = 'X - abc_degree + 1 + qq + abc_degree + 1'
for match in re.finditer(re.escape('abc_degree + 1') , example):

... print(match.start(), match.end())
...
...
4 18
26 40


Just for more fun :) -

Without knowing how general your expressions will be, I think the 
following version is very readable, certainly more readable than regexes:


example = 'X - abc_degree + 1 + qq + abc_degree + 1'
KEY = 'abc_degree + 1'

for i in range(len(example)):
    if example[i:].startswith(KEY):
    print(i, i + len(KEY))
# prints:
4 18
26 40
I think it's often a good idea to use a standard library function 
instead of rolling your own. The issue becomes less clear-cut when the 
standard library doesn't do exactly what you need (as here, where 
re.finditer() uses regular expressions while the use case only uses 
simple search strings). Ideally there would be a str.finditer() method 
we could use, but in the absence of that I think we still need to 
consider using the almost-but-not-quite fitting re.finditer().


Two reasons:

(1) I think it's clearer: the name tells us what it does (though of 
course we could solve this in a hand-written version by wrapping it in a 
suitably named function).


(2) Searching for a string in another string, in a performant way, is 
not as simple as it first appears. Your version works correctly, but 
slowly. In some situations it doesn't matter, but in other cases it 
will. For better performance, string searching algorithms jump ahead 
either when they found a match or when they know for sure there isn't a 
match for some time (see e.g. the Boyer–Moore string-search algorithm). 
You could write such a more efficient algorithm, but then it becomes 
more complex and more error-prone. Using a well-tested existing function 
becomes quite attractive.


Sure, it all depends on what the real task will be.  That's why I wrote 
"Without knowing how general your expressions will be". For the example 
string, it's unlikely that speed will be a factor, but who knows what 
target strings and keys will turn up in the future?


To illustrate the difference performance, I did a simple test (using the 
paragraph above is test text):


     import re
     import timeit

     def using_re_finditer(key, text):
     matches = []
     for match in re.finditer(re.escape(key), text):
     matches.append((match.start(), match.end()))
     return matches


     def using_simple_loop(key, text):
     matches = []
     for i in range(len(text)):
     if text[i:].startswith(key):
     matches.append((i, i + len(key)))
     return matches


     CORPUS = """Searching for a string in another string, in a 
performant way, is
     not as simple as it first appears. Your version works correctly, 
but slowly.
     In some situations it doesn't matter, but in other cases it will. 
For better
     performance, string searching algorithms jump ahead either when 
they found a
     match or when they know for sure there isn't a match for some time 
(see e.g.

     the Boyer–Moore string-search algorithm). You could write such a more
     efficient algorithm, but then it becomes more complex and more 
error-prone.

     Using a well-tested existing function becomes quite attractive."""
     KEY = 'in'
     print('using_simple_loop:', 
timeit.repeat(stmt='using_simple_loop(KEY, CORPUS)', globals=globals(), 
number=1000))
     print('using_re_finditer:', 
timeit.repeat(stmt='using_re_finditer(KEY, CORPUS)', globals=globals(), 
number=1000))


This does 5 runs of 1000 repetitions each, and reports the time in 
seconds for each of those runs.

Result on my machine:

     using_simple_loop: [0.1395295020792, 0.1306313000456, 
0.1280345001249, 0.1318618002423, 0.1308461032626]
     using_re_finditer: [0.00386140005233, 0.00406190124297, 
0.00347899970256, 0.00341310216218, 0.003732001273]


We find that in this test re.finditer() is more than 30 times faster 
(despite the overhead of regular expressions.


While speed isn't everything in programming, with such a large 
difference in performance and (to me) no real disadvantages of using 
re.finditer(), I would prefer re.finditer() over writing my own.




--
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Roel Schroeven

Op 28/02/2023 om 3:44 schreef Thomas Passin:

On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
And, just for fun, since there is nothing wrong with your code, this 
minor change is terser:



example = 'X - abc_degree + 1 + qq + abc_degree + 1'
for match in re.finditer(re.escape('abc_degree + 1') , example):

... print(match.start(), match.end())
...
...
4 18
26 40


Just for more fun :) -

Without knowing how general your expressions will be, I think the 
following version is very readable, certainly more readable than regexes:


example = 'X - abc_degree + 1 + qq + abc_degree + 1'
KEY = 'abc_degree + 1'

for i in range(len(example)):
    if example[i:].startswith(KEY):
    print(i, i + len(KEY))
# prints:
4 18
26 40
I think it's often a good idea to use a standard library function 
instead of rolling your own. The issue becomes less clear-cut when the 
standard library doesn't do exactly what you need (as here, where 
re.finditer() uses regular expressions while the use case only uses 
simple search strings). Ideally there would be a str.finditer() method 
we could use, but in the absence of that I think we still need to 
consider using the almost-but-not-quite fitting re.finditer().


Two reasons:

(1) I think it's clearer: the name tells us what it does (though of 
course we could solve this in a hand-written version by wrapping it in a 
suitably named function).


(2) Searching for a string in another string, in a performant way, is 
not as simple as it first appears. Your version works correctly, but 
slowly. In some situations it doesn't matter, but in other cases it 
will. For better performance, string searching algorithms jump ahead 
either when they found a match or when they know for sure there isn't a 
match for some time (see e.g. the Boyer–Moore string-search algorithm). 
You could write such a more efficient algorithm, but then it becomes 
more complex and more error-prone. Using a well-tested existing function 
becomes quite attractive.


To illustrate the difference performance, I did a simple test (using the 
paragraph above is test text):


    import re
    import timeit

    def using_re_finditer(key, text):
    matches = []
    for match in re.finditer(re.escape(key), text):
    matches.append((match.start(), match.end()))
    return matches


    def using_simple_loop(key, text):
    matches = []
    for i in range(len(text)):
    if text[i:].startswith(key):
    matches.append((i, i + len(key)))
    return matches


    CORPUS = """Searching for a string in another string, in a 
performant way, is
    not as simple as it first appears. Your version works correctly, 
but slowly.
    In some situations it doesn't matter, but in other cases it will. 
For better
    performance, string searching algorithms jump ahead either when 
they found a
    match or when they know for sure there isn't a match for some time 
(see e.g.

    the Boyer–Moore string-search algorithm). You could write such a more
    efficient algorithm, but then it becomes more complex and more 
error-prone.

    Using a well-tested existing function becomes quite attractive."""
    KEY = 'in'
    print('using_simple_loop:', 
timeit.repeat(stmt='using_simple_loop(KEY, CORPUS)', globals=globals(), 
number=1000))
    print('using_re_finditer:', 
timeit.repeat(stmt='using_re_finditer(KEY, CORPUS)', globals=globals(), 
number=1000))


This does 5 runs of 1000 repetitions each, and reports the time in 
seconds for each of those runs.

Result on my machine:

    using_simple_loop: [0.1395295020792, 0.1306313000456, 
0.1280345001249, 0.1318618002423, 0.1308461032626]
    using_re_finditer: [0.00386140005233, 0.00406190124297, 
0.00347899970256, 0.00341310216218, 0.003732001273]


We find that in this test re.finditer() is more than 30 times faster 
(despite the overhead of regular expressions.


While speed isn't everything in programming, with such a large 
difference in performance and (to me) no real disadvantages of using 
re.finditer(), I would prefer re.finditer() over writing my own.


--
"The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom."
-- Isaac Asimov

--
https://mail.python.org/mailman/listinfo/python-list