On 01May2021 05:30, David Mertz <me...@gnosis.cx> wrote:
>I was actually thinking about this before the recent "string comprehension"
>thread.  I wasn't really going to post the idea, but it's similar enough
>that I am nudged to.  Moreover, since PEP 616 added str.removeprefix() and
>str.removesuffix(), this feels like a natural extension of that.
>
>I find myself very often wanting to remove several substrings of similar
>lines to get at "the good bits" for my purpose.  Log files are a good
>example of this, but it arises in lots of other contexts I encounter.
>Let's take a not-absurd hypothetical:
>
>GET [http://example.com/picture] 200 image/jpeg
>POST [http://nowhere.org/data] 200 application/json
>PUT [https://example.org/page] 200 text/html
>
>For each of these lines, I'd like to see the URL and the MIME type only.
>The new str.removeprefix() helps some, but not as much as I would like
>since the "remove a tuple of prefixes" idea was rejected for PEP 616.  But
>even past that, very often much of what I want to remove is in the middle,
>not at the start or the end.

This is not a good way to tidy up log lines. try parsing it into fields:

    PUT
    http://example.com/picture
    200
    image.jpeg

and then only looking at the fields you care about.

>I know I can use regular expressions here.  However, they are definitely a
>higher cognitive burden, and especially so for those who haven't taught
>them and written about them a lot, as I have.  Even for me, I'd rather not
>think about regexen if I don't *have to*.

Though for this, they are ok. Or even just:

    method, _url_, code, mimetype = line.split(None,3)

There shouldn't be any whitespace in a log line URL - it should be 
percent encoded.

>So probably I'll do something
>like this:
>
>for line in lines:
>    for noise in ('GET', 'POST', 'PUT', '200', '[', ']'):
>        line = line.replace(noise, '')

This is a very bad way to do this. What about thr URL 
"http://example.com/foo/PUT/bah";. Badness ensues. It's worse than using 
a well written regexp.

>    process_line(line)
>
>That's not horrible, but it would be nicer to write:
>
>for line in lines:
>    process_line(line.remove(('GET', 'POST', 'PUT', '200', '[', ']'))

I'm -1 on this idea.

As you note, str.replace already exists and does what your line.remove 
does, just on a single substring basis. It's a trivial exercise to write 
an mreplace(s,substrs) function. Just do it and put it in your personal 
kit, and import it.

>Of course, if I really needed this as much as I seem to be suggesting, 
>I
>know how to write a function `remove_strings()`... and I confess I have not
>done that. Or at least I haven't done it in some standard "my_utils" module
>I always import.  Nonetheless, a string method would feel even more natural
>than a function taking the string as an argument.

A method is almost always "easier/natural", but how many do we really 
want? If you really want this, write a StrMixin with a bunch of nice 
methods, subclass str, and promote your lines to your new subclass.  
Methods managed!

Cheers,
Cameron Simpson <c...@cskk.id.au>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NIOOPW2LJ754EVTAVEDQWFW2RTCD2CH7/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to