[Python-ideas] Re: Addition to fnmatch.py

2022-06-07 Thread MRAB

On 2022-06-07 11:03, Chris Angelico wrote:

On Tue, 7 Jun 2022 at 19:09, Ben Rudiak-Gould  wrote:


This calls the predicate once per element:

def partition(pred, iterable):
t1, t2 = tee((pred(x), x) for x in iterable)
return (x for b, x in t1 if not b), (x for b, x in t2 if b)

It's kind of inefficient though.


Honestly, if it weren't that there's currently a recipe in itertools,
I think the list-based version would be the best recipe to offer. The
cost of doing the job lazily AND maintaining all the other
expectations is too high; if you really need it to be lazy, it's
probably worth seeing if one of the other demands can be relaxed (eg
if it's fine to call the predicate twice).

For the OP's task, doing the partitioning eagerly wouldn't be an issue.

The problem with a lazy solution is that if you're producing 2 streams 
of output, as it were, and if you're consuming from only one of them, 
what are you going to do about the other one? You still need to store 
the items for it in case you want to consume them later, and if the 
original iterable is infinite, you have a problem...

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T3JM5DZSN55OEWINS2HYJAZJLOKLJWPJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-07 Thread Chris Angelico
On Tue, 7 Jun 2022 at 19:09, Ben Rudiak-Gould  wrote:
>
> This calls the predicate once per element:
>
> def partition(pred, iterable):
> t1, t2 = tee((pred(x), x) for x in iterable)
> return (x for b, x in t1 if not b), (x for b, x in t2 if b)
>
> It's kind of inefficient though.

Honestly, if it weren't that there's currently a recipe in itertools,
I think the list-based version would be the best recipe to offer. The
cost of doing the job lazily AND maintaining all the other
expectations is too high; if you really need it to be lazy, it's
probably worth seeing if one of the other demands can be relaxed (eg
if it's fine to call the predicate twice).

For the OP's task, doing the partitioning eagerly wouldn't be an issue.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JA7C3XAP4H53VJBI2QR56XDJA2GQJN3W/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-07 Thread Ben Rudiak-Gould
This calls the predicate once per element:

def partition(pred, iterable):
t1, t2 = tee((pred(x), x) for x in iterable)
return (x for b, x in t1 if not b), (x for b, x in t2 if b)

It's kind of inefficient though.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DQBFYD7KYRG6OOMXIY7IH45S25U6SXTK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Chris Angelico
On Tue, 7 Jun 2022 at 10:26, Steven D'Aprano  wrote:
>
> Why don't you use the version from the itertools recipes?
>
>
> ```
> from itertools import tee, filterfalse
> def partition(pred, iterable):
> "Use a predicate to partition entries into false entries and true entries"
> # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
> t1, t2 = tee(iterable)
> return filterfalse(pred, t1), filter(pred, t2)
> ```
>
>

Calls the predicate twice for every element, so it's only good for a
guaranteed-deterministic predicate. You may want to read the thread
before responding.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NS6ORDJ4STDGOJID5B5NQXZ4MVC5QORQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Steven D'Aprano
Why don't you use the version from the itertools recipes?


```
from itertools import tee, filterfalse
def partition(pred, iterable):
"Use a predicate to partition entries into false entries and true entries"
# partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
```


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TOQKBCZNXZRH5BVFPPFOPJTVPA3KOZ54/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Steven D'Aprano
On Mon, Jun 06, 2022 at 06:17:32PM +0200, Benedict Verhegghe wrote:

> There still is something wrong. I get the second list twice:
> 
> odd, even = partition(lambda i: i % 2, range(20))
> print(list(odd))
> [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
> print(list(even))
> [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Confirmed. But if you replace range(20) with `iter(range(20))`, it 
works correctly.

The plot thickens. The duplicated list is not from the second list 
created, but the second list *evaluated*. So if you run:

odd, even = partition(...)

as before, but evaluate *even* first and odd second:

print(list(even))
print(list(odd))

it is odd that is doubled, not even.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OJIA6HIH6UJYE7X7LU7W46QI2X6UPTK6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Chris Angelico
On Tue, 7 Jun 2022 at 01:58, David Mertz, Ph.D.  wrote:
> In [2]: def isEven(n):
>...: return n/2 == n//2
>...:
>

Huh. That's a very strange way to write that predicate.

>>> isEven(2)
True
>>> isEven(20001)
True
>>> isEven(20002)
False
>>> isEven(20003)
False
>>> isEven(20004)
True

Why not just "return n % 2 == 0" ?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HDNXSZI24RIMJDHWOMPIWQXN4EZUA4Y3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Benedict Verhegghe




Op 6/06/2022 om 18:07 schreef Mathew Elman:

oops, you're right, I have ended up with an unnecessary "not" in there, should 
be:

from collections import deque


def partition(pred, iterable):
 results = deque([]), deque([])

 def gen_split(only):
 for thing in iterable:
 if results[only]:
 yield results[only].popleft()
 results[pred(thing)].append(thing)
 for thing in results[only]:
 yield thing

 return gen_split(True), gen_split(False)


There still is something wrong. I get the second list twice:

odd, even = partition(lambda i: i % 2, range(20))
print(list(odd))
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(list(even))
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JYGSSXPN6FLGHVJPUM4XTPH2EABZZMLS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Mathew Elman
oops, you're right, I have ended up with an unnecessary "not" in there, should 
be:

from collections import deque


def partition(pred, iterable):
results = deque([]), deque([])

def gen_split(only):
for thing in iterable:
if results[only]:
yield results[only].popleft()
results[pred(thing)].append(thing)
for thing in results[only]:
yield thing

return gen_split(True), gen_split(False)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QURICBHMB3D2WG2VWEBJ56Q6ZZHKBUJX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread David Mertz, Ph.D.
This is really nice, but I think you have a negation wrong in there
somewhere:

In [1]: from collections import deque
   ...:
   ...: def partition(pred, iterable):
   ...: results = deque([]), deque([])
   ...:
   ...: def gen_split(only):
   ...: for thing in iterable:
   ...: if results[only]:
   ...: yield results[only].popleft()
   ...: results[not pred(thing)].append(thing)
   ...: for thing in results[only]:
   ...: yield thing
   ...:
   ...: return gen_split(True), gen_split(False)
   ...:

In [2]: def isEven(n):
   ...: return n/2 == n//2
   ...:

In [3]: from itertools import count

In [4]: evens, odds = partition(isEven, count())

In [5]: next(evens)
Out[5]: 1

In [6]: next(evens)
Out[6]: 3

In [7]: next(evens)
Out[7]: 5

In [8]: next(odds)
Out[8]: 0

In [9]: next(odds)
Out[9]: 2

In [10]: next(odds)
Out[10]: 4

On Mon, Jun 6, 2022 at 10:32 AM Mathew Elman  wrote:

> from collections import deque
>
> def partition(pred, iterable):
> results = deque([]), deque([])
>
> def gen_split(only):
> for thing in iterable:
> if results[only]:
> yield results[only].popleft()
> results[not pred(thing)].append(thing)
> for thing in results[only]:
> yield thing
>
> return gen_split(True), gen_split(False)
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/EUAYPXMZM4DOGTAAUURNOMWTDU65K6UY/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HWKJMLZSQVZQERYAKYZUAMK2YGIFUPIX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread John Carter
Thanks for all the comments. 
 I did consider using sets but order was important and when partitioning a list 
of general strings, not file names, there could be identical  and important 
strings (strings at different positions could mean different things, especially 
when generating test and training sets). 
I also tried doing it twice (generating lists) and it took approximately twice 
as long though generating two iterables could defer the partitioning till it 
was needed, i.e. lazy evaluation.

I’ve added some of the solutions to by timing test. Identified by capital 
letters at the end.

Reference list lengths 144004  855996
Number of tests cases 100
Example data ['xDy7AWbXau', 'TXlzsZV3Ra', 'YJh8uD9ovK', 'aRJ2U7nWs8', 
'geu.vHlogu']
FNmatch0.671875 268756  0 2687560
WCmatch1.562500 264939  0 2649390
IWCmatch   1.281250  0  0  01
Easy   0.093750 144004 855996 1001
Re 0.328125 855996 144004 1000
Positive   0.281250 144004  0 1440041
Negative   0.328125  0 855996 8559961
UppeeCase  0.375000 268756  0 2687560
Null   0.00  0  0  01
Both   0.328125 144004 855996 1001
Partition  1.171875 855996 144004 1000
IBoth  0.328125 855996 144004 1000
MRAB   0.437500 144004 855996 1001
METZ   0.50 144004 855996 1001
CA 0.343750 144004 855996 1001
SJB0.328125 144004 855996 1001

I checked for order and interestingly all the set solutions preserve it. I 
think this is because there are no duplicates in the test data. Order is only 
checked correctly if thee are the same number of elements in the test and 
reference lists

I also tried more_itertoolls.partition. Nearly 4 times slower.
John
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6WFSIXUF67KU5QXOODCBG7RDDG2KSYSK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Mathew Elman
from collections import deque

def partition(pred, iterable):
results = deque([]), deque([])

def gen_split(only):
for thing in iterable:
if results[only]:
yield results[only].popleft()
results[not pred(thing)].append(thing)
for thing in results[only]:
yield thing

return gen_split(True), gen_split(False)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EUAYPXMZM4DOGTAAUURNOMWTDU65K6UY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread David Mertz, Ph.D.
As Chris Angelico notes, that's only good for finite iterables.

The recipe in the itertools docs is unbounded, but it's also unbounded in
the cache needed if you get long strings of things the do (or don't)
confirm the predicate.

On Mon, Jun 6, 2022 at 7:06 AM Stephen J. Turnbull <
stephenjturnb...@gmail.com> wrote:

> Christopher Barker writes:
>
>  > How do you divide a sequence into two sequences cleanly?
>
> I don't know how to do it in a one liner, but
>
> yes = []
> no = []
> for elem in seq:
> (yes if test(elem) else no).append(elem)
>
> seems pretty clean to me.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/BBVBDT7KJTTOD5VQYIHBCQBU5QV32ZUB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4RIL6JUJXJDF3G6HWVU4DD33TKPEZYCI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Stephen J. Turnbull
Christopher Barker writes:

 > How do you divide a sequence into two sequences cleanly?

I don't know how to do it in a one liner, but

yes = []
no = []
for elem in seq:
(yes if test(elem) else no).append(elem)

seems pretty clean to me.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BBVBDT7KJTTOD5VQYIHBCQBU5QV32ZUB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Chris Angelico
On Mon, 6 Jun 2022 at 18:22, Paul Moore  wrote:
>
> There’s an itertools recipe, “partition”.
>

Yes, there is (and plenty of equivalents in other languages). But it's
still not what I'd call "particularly easy". What that recipe does is
filter the same iterable with the same predicate twice, so it depends
on the predicate being repeatable. You can't use it, for instance, to
split a data set into training and test sets, even though conceptually
this should work:

training, test = partition(lambda x: random.random() < 0.9, dataset)

To truly perform a one-pass partitioning operation, you'd have to do
something like this:

def partition(pred, iter):
results = [], []
for thing in iter:
results[not pred(thing)].append(thing)
return results

Obviously that works only on finite iterables, so it's not appropriate
for an itertools recipe, but it does guarantee that the predicate is
called once per item and each item ends up in precisely one list.

It would be kinda nice to be able to do something like this with a
comprehension, since there's a fair bit of overhead to calling a
function (predicate functions have to be called for each element, but
a comprehension wraps everything up into one function). But since it's
not, the above is good enough for the cases where it's needed - not
particularly easy, but not all that hard either.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BUFIMKKIJOPJQQ4WT2VLNXTTLAPQGCIN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Paul Moore
There’s an itertools recipe, “partition”.

On Mon, 6 Jun 2022 at 08:28, Chris Angelico  wrote:

> On Mon, 6 Jun 2022 at 16:42, Christopher Barker 
> wrote:
> >
> > I think y’all have addressed the OP’s problem — at least the performance
> issues.
> >
> > But I think this brings up a pattern that I don’t have a nifty way to
> address:
> >
> > How do you divide a sequence into two sequences cleanly?
> >
> > The filter pattern: selectively remove items from a sequence, returning
> a new sequence. There are a few ways to do that in Python.
> >
> > But what if you need both the remaining items and the removed ones? Easy
> enough to write a loop that populates two lists, but is there a nifty
> one-liner?
> >
> > Is this a known functional pattern?
> >
>
> I'd call that a "partitioning" task, and you're right, it's not
> particularly easy in Python.
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/NEY4JKQMDTMS6PN3I6VK6FCICRLCG43P/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7TFCBUT6UD67UVYNUUTKUP26A5FGSUDO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Chris Angelico
On Mon, 6 Jun 2022 at 16:42, Christopher Barker  wrote:
>
> I think y’all have addressed the OP’s problem — at least the performance 
> issues.
>
> But I think this brings up a pattern that I don’t have a nifty way to address:
>
> How do you divide a sequence into two sequences cleanly?
>
> The filter pattern: selectively remove items from a sequence, returning a new 
> sequence. There are a few ways to do that in Python.
>
> But what if you need both the remaining items and the removed ones? Easy 
> enough to write a loop that populates two lists, but is there a nifty 
> one-liner?
>
> Is this a known functional pattern?
>

I'd call that a "partitioning" task, and you're right, it's not
particularly easy in Python.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NEY4JKQMDTMS6PN3I6VK6FCICRLCG43P/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-06 Thread Christopher Barker
I think y’all have addressed the OP’s problem — at least the performance
issues.

But I think this brings up a pattern that I don’t have a nifty way to
address:

How do you divide a sequence into two sequences cleanly?

The filter pattern: selectively remove items from a sequence, returning a
new sequence. There are a few ways to do that in Python.

But what if you need both the remaining items and the removed ones? Easy
enough to write a loop that populates two lists, but is there a nifty
one-liner?

Is this a known functional pattern?

-CHB




On Sun, Jun 5, 2022 at 7:39 PM Eric V. Smith via Python-ideas <
python-ideas@python.org> wrote:

>
> On 6/5/2022 1:22 PM, Benedict Verhegghe wrote:
> > Op 5/06/2022 om 18:47 schreef David Mertz, Ph.D.:
> >> Sure, that's nice enough code and has the same big-O complexity. I
> >> suspect set difference is a little faster (by a constant multiple)
> >> because it hits C code more, but I haven't benchmarked.
> >>
> >> The OP said the elements were from fnmatch though, which explicitly
> >> does not promise order. So really it's just whether you like your
> >> code or this better aesthetically:
> >>
> >>  list(set(b) - set(a))
> >>
> > I benchmarked it and indeed the list difference is a little faster.
> > >>> timeit.timeit('s=set(b); [x for x in a if x not in s]', setup='a =
> > list(range(1)); b = list(range(1000))', number=1000)
> > 0.6156670850032242
> > >>> timeit.timeit('list(set(a)-set(b))', setup='a =
> > list(range(1)); b = list(range(1000))', number=1000)
> > 0.43649216600169893
> >
> > And what's more, it seems to also preserve order. I guess because a
> > set is implemented like a dict, and will preserve order of you only
> > remove some elements from the set.
> >
> A set doesn't guarantee order at all, and indeed it does not preserve
> creation order:
>
>  >>> s = set([10, 20, 1])
>  >>> s
> {1, 10, 20}
>
> You're seeing an accidental byproduct of the implementation.
>
> Eric
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/WZFC3WAOMTQ2UQIMGBJ2VTBBOE5XUWBR/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/P2OTU27MUOV2SY2TUVFCQ47R4KBCMPFF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-05 Thread Eric V. Smith via Python-ideas


On 6/5/2022 1:22 PM, Benedict Verhegghe wrote:

Op 5/06/2022 om 18:47 schreef David Mertz, Ph.D.:
Sure, that's nice enough code and has the same big-O complexity. I 
suspect set difference is a little faster (by a constant multiple) 
because it hits C code more, but I haven't benchmarked.


The OP said the elements were from fnmatch though, which explicitly 
does not promise order. So really it's just whether you like your 
code or this better aesthetically:


     list(set(b) - set(a))


I benchmarked it and indeed the list difference is a little faster.
>>> timeit.timeit('s=set(b); [x for x in a if x not in s]', setup='a = 
list(range(1)); b = list(range(1000))', number=1000)

0.6156670850032242
>>> timeit.timeit('list(set(a)-set(b))', setup='a = 
list(range(1)); b = list(range(1000))', number=1000)

0.43649216600169893

And what's more, it seems to also preserve order. I guess because a 
set is implemented like a dict, and will preserve order of you only 
remove some elements from the set.


A set doesn't guarantee order at all, and indeed it does not preserve 
creation order:


>>> s = set([10, 20, 1])
>>> s
{1, 10, 20}

You're seeing an accidental byproduct of the implementation.

Eric

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WZFC3WAOMTQ2UQIMGBJ2VTBBOE5XUWBR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-05 Thread Benedict Verhegghe

Op 5/06/2022 om 18:47 schreef David Mertz, Ph.D.:
Sure, that's nice enough code and has the same big-O complexity. I 
suspect set difference is a little faster (by a constant multiple) 
because it hits C code more, but I haven't benchmarked.


The OP said the elements were from fnmatch though, which explicitly does 
not promise order. So really it's just whether you like your code or 
this better aesthetically:


     list(set(b) - set(a))


I benchmarked it and indeed the list difference is a little faster.
>>> timeit.timeit('s=set(b); [x for x in a if x not in s]', setup='a = 
list(range(1)); b = list(range(1000))', number=1000)

0.6156670850032242
>>> timeit.timeit('list(set(a)-set(b))', setup='a = list(range(1)); 
b = list(range(1000))', number=1000)

0.43649216600169893

And what's more, it seems to also preserve order. I guess because a set 
is implemented like a dict, and will preserve order of you only remove 
some elements from the set.


___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YBEQ6TZ6CQDGRLZ6HFIV4HJ7LCFUW4UA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-05 Thread David Mertz, Ph.D.
On Sun, Jun 5, 2022, 12:08 PM MRAB  wrote:

> On 2022-06-05 16:12, David Mertz, Ph.D. wrote:
> > This is exactly the problem solved by set difference. E.g. `{a, b, c}
> - {a, c}`.
> >
> > This operation is linear on the size of the removed set.
> >
> You don't have to use a set difference, which might matter if the order
> was important, but just make a set of the ones found:
>
>  s = set(b)
>
> And then the list comprehension:
>
>  [x for x in a if x not in s]
>

Sure, that's nice enough code and has the same big-O complexity. I suspect
set difference is a little faster (by a constant multiple) because it hits
C code more, but I haven't benchmarked.

The OP said the elements were from fnmatch though, which explicitly does
not promise order. So really it's just whether you like your code or this
better aesthetically:

list(set(b) - set(a))

Personally I like my (somewhat shorter) version as more clear. But YMMV.

>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FD3JEDTKB4COFBRSLGIT53Y2GFW6MTZF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-05 Thread MRAB

On 2022-06-05 16:12, David Mertz, Ph.D. wrote:
This is exactly the problem solved by set difference. E.g. `{a, b, c} - 
{a, c}`.


This operation is linear on the size of the removed set.

You don't have to use a set difference, which might matter if the order 
was important, but just make a set of the ones found:


s = set(b)

And then the list comprehension:

[x for x in a if x not in s]

On Sun, Jun 5, 2022, 11:01 AM John Carter > wrote:


I’d like to propose a simple addition to the 'fnmatch' module.
Specificity a function that returns a list of names that match a
pattern AND a list of those that don't.

In a recent project I found that I wished to split  a list of files
and move those that matched a pattern to one folder and those that
didn't to another folder. I was using fnmatch.filter to select the
first set of files and then a list comprehension to generate the
second set.

For a small number of files (~ 10) this was perfectly adequate.
However as I needed to process many files (>>1) the execution
time was very significant. Profiling the code showed that the time
was spent in generating the second set. I tried a number of
solutions including designing a negative filter, walking the file
system to find those files that had not been moved and using more
and more convoluted ways to improve the second selection. Eventually
I gave in and hacked a local copy of fnmatch.py as below:

def split(names, pat):
     """Return the subset of the list NAMES that match PAT."""
     """Also returns those names not in NAMES"""
     result = []
     notresult = []
     pat = os.path.normcase(pat)
     pattern_match = _compile_pattern(pat)
     if os.path is posixpath:
         # normcase on posix is NOP. Optimize it away from the loop.
         for name in names:
             if not pattern_match(name):
                 result.append(name)
             else:
                 notresult.append(name)
     else:
         for name in names:
             if not pattern_match(os.path.normcase(name)):
                 result.append(name)
             else:
                 notresult.append(name)
     return result, notresult

The change is the addition of else clauses to the if not pattermath
statements. This solved the problem and benchmarking showed that it
only took a very small additional time (20ms for a million strings)
to generate both lists

Number of tests cases 100
Example data ['Ba1txmKkiC', 'KlJx.f_AGj', 'Umwbw._Wa9', '4YlgA5LVpI’]
Search for '*A*'
Test                            Time(sec)       Positive        Negative
WCmatch.filter          1.953125           26211          0
filter                         0.328125    14259          0
split                  0.343750    14259      85741
List Comp.           270.468751     14259      85741

The list comprehension [x for x in a if x not in b]*, was nearly 900
times slower.

‘fnmatch’ was an appropriate solution to this problem as typing
‘glob’ style search patterns was easier than having to enter regular
expressions when prompted by my code.

I would like to propose that split, even though it is very simple,
be included in the 'fnmatch' module.

John

*a is the original and b is those that match.


___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FRQJT7AWCVQXYJPC4IDWA3LZGVLX765A/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-05 Thread Jean Abou Samra



Le 05/06/2022 à 16:52, John Carter a écrit :

I’d like to propose a simple addition to the 'fnmatch' module. Specificity a 
function that returns a list of names that match a pattern AND a list of those 
that don't.

In a recent project I found that I wished to split  a list of files and move 
those that matched a pattern to one folder and those that didn't to another 
folder. I was using fnmatch.filter to select the first set of files and then a 
list comprehension to generate the second set.

For a small number of files (~ 10) this was perfectly adequate. However as I needed 
to process many files (>>1) the execution time was very significant. 
Profiling the code showed that the time was spent in generating the second set. I 
tried a number of solutions including designing a negative filter, walking the file 
system to find those files that had not been moved and using more and more convoluted 
ways to improve the second selection. Eventually I gave in and hacked a local copy of 
fnmatch.py as below:

def split(names, pat):
 """Return the subset of the list NAMES that match PAT."""
 """Also returns those names not in NAMES"""
 result = []
 notresult = []
 pat = os.path.normcase(pat)
 pattern_match = _compile_pattern(pat)
 if os.path is posixpath:
 # normcase on posix is NOP. Optimize it away from the loop.
 for name in names:
 if not pattern_match(name):
 result.append(name)
 else:
 notresult.append(name)
 else:
 for name in names:
 if not pattern_match(os.path.normcase(name)):
 result.append(name)
 else:
 notresult.append(name)
 return result, notresult

The change is the addition of else clauses to the if not pattermath statements. 
This solved the problem and benchmarking showed that it only took a very small 
additional time (20ms for a million strings) to generate both lists

Number of tests cases 100
Example data ['Ba1txmKkiC', 'KlJx.f_AGj', 'Umwbw._Wa9', '4YlgA5LVpI’]
Search for '*A*'
TestTime(sec)   PositiveNegative
WCmatch.filter  1.953125   26211  0
filter 0.32812514259  0
split  0.34375014259  85741
List Comp.   270.468751 14259  85741

The list comprehension [x for x in a if x not in b]*, was nearly 900 times 
slower.

‘fnmatch’ was an appropriate solution to this problem as typing ‘glob’ style 
search patterns was easier than having to enter regular expressions when 
prompted by my code.

I would like to propose that split, even though it is very simple, be included 
in the 'fnmatch' module.

John

*a is the original and b is those that match.




Searching an element in a list scans the elements
one by one, so [x for x in a if x not in b]
has quadratic complexity. Try list(set(a) - set(b))
instead.

>>> import timeit
>>> timeit.timeit('[x for x in a if x not in b]', setup='a = 
list(range(10_000)); b = list(range(1000))', number=100)

5.258457429998089
>>>
>>> timeit.timeit('list(set(a)-set(b))', setup='a = 
list(range(10_000)); b = list(range(1000))', number=100)

0.07163406799372751


Jean


___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VPWSYBDEW7HZBLU3JBDPFPO6KI3VM4LC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Addition to fnmatch.py

2022-06-05 Thread David Mertz, Ph.D.
This is exactly the problem solved by set difference. E.g. `{a, b, c} - {a,
c}`.

This operation is linear on the size of the removed set.

On Sun, Jun 5, 2022, 11:01 AM John Carter  wrote:

> I’d like to propose a simple addition to the 'fnmatch' module. Specificity
> a function that returns a list of names that match a pattern AND a list of
> those that don't.
>
> In a recent project I found that I wished to split  a list of files and
> move those that matched a pattern to one folder and those that didn't to
> another folder. I was using fnmatch.filter to select the first set of files
> and then a list comprehension to generate the second set.
>
> For a small number of files (~ 10) this was perfectly adequate. However as
> I needed to process many files (>>1) the execution time was very
> significant. Profiling the code showed that the time was spent in
> generating the second set. I tried a number of solutions including
> designing a negative filter, walking the file system to find those files
> that had not been moved and using more and more convoluted ways to improve
> the second selection. Eventually I gave in and hacked a local copy of
> fnmatch.py as below:
>
> def split(names, pat):
> """Return the subset of the list NAMES that match PAT."""
> """Also returns those names not in NAMES"""
> result = []
> notresult = []
> pat = os.path.normcase(pat)
> pattern_match = _compile_pattern(pat)
> if os.path is posixpath:
> # normcase on posix is NOP. Optimize it away from the loop.
> for name in names:
> if not pattern_match(name):
> result.append(name)
> else:
> notresult.append(name)
> else:
> for name in names:
> if not pattern_match(os.path.normcase(name)):
> result.append(name)
> else:
> notresult.append(name)
> return result, notresult
>
> The change is the addition of else clauses to the if not pattermath
> statements. This solved the problem and benchmarking showed that it only
> took a very small additional time (20ms for a million strings) to generate
> both lists
>
> Number of tests cases 100
> Example data ['Ba1txmKkiC', 'KlJx.f_AGj', 'Umwbw._Wa9', '4YlgA5LVpI’]
> Search for '*A*'
> TestTime(sec)   PositiveNegative
> WCmatch.filter  1.953125   26211  0
> filter 0.32812514259  0
> split  0.34375014259  85741
> List Comp.   270.468751 14259  85741
>
> The list comprehension [x for x in a if x not in b]*, was nearly 900 times
> slower.
>
> ‘fnmatch’ was an appropriate solution to this problem as typing ‘glob’
> style search patterns was easier than having to enter regular expressions
> when prompted by my code.
>
> I would like to propose that split, even though it is very simple, be
> included in the 'fnmatch' module.
>
> John
>
> *a is the original and b is those that match.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/EZEGFGJOHVHATKDBJ2SWZML62JWT2VE2/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/72KFWQJHIFUJJRH32WIGEZ64BRN7UK6E/
Code of Conduct: http://python.org/psf/codeofconduct/