Re: [Python-ideas] Add the imath module

2018-07-13 Thread Steven D'Aprano
On Fri, Jul 13, 2018 at 10:52:38AM -0500, Tim Peters wrote:

> Steven's numbers are pretty baffling to me, since these are all composite
> and so iterating Miller-Rabin "should get out" pretty fast:

That's because you haven't seen my code :-)

It's over-engineered, class-based, and written as a learning exercise. 
There's a compatibility layer so it will work back to Python 2.4 and an 
instrumentation layer which I inserted in a fit of enthusiasm to gather 
staticistics, which I have never once looked at since.

And *even so* it is still fast enough for casual use at the interactive 
interpreter, compared to more naive algorithms with worse Big O 
performance characteristics.

You might think 5 seconds is slow, but I'm serious when I say some of 
the other algorithms I played with would take *days* to generate, 
or check, largish primes.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-13 Thread Eric Snow
On Tue, Jul 10, 2018 at 8:32 AM David Foster  wrote:
>
> I was not aware of PyParallel. The PyParellel "parallel thread"
> line-of-execution implementation is pretty interesting. Trent, big kudos
> to you on that effort.

+1  It's a neat project.  Trent's pretty smart. :)

> Since you're speaking in the past tense and said "but we're not doing it
> like that", I infer that the notion of a parallel thread was turned down
> for integration into CPython, as that appears to have been the original
> goal.
>
> However I am unable to locate a rationale for why that integration was
> turned down. Was it deemed to be too complex to execute, perhaps in the
> context of providing C extension compatibility? Was there a desire to
> see a similar implementation on Linux as well as Windows? Some other
> reason?

Trent can correct me if I'm wrong, but I believe it boiled down to
challenges with the POSIX implementation (that email thread implies
this as well), likely coupled with limited time for Trent to work on
it.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-13 Thread Eric Snow
On Sun, Jul 8, 2018 at 12:30 PM David Foster  wrote:
> In the past I have personally viewed Python as difficult to use for
> parallel applications, which need to do multiple things simultaneously
> for increased performance:
>
> * The old Threads, Locks, & Shared State model is inefficient in Python
> due to the GIL, which limits CPU usage to only one thread at a time
> (ignoring certain functions implemented in C, such as I/O).
>
> * The Actor model can be used with some effort via the “multiprocessing”
> module, but it doesn’t seem that streamlined and forces there to be a
> separate OS process per line of execution, which is relatively expensive.

Yep, Python's multi-core story is a bit rough (Jython/IronPython
aside).  It's especially hard for folks used to
concurrency/parallelism in other languages.  I'm hopeful that we can
improve the situation.

> I was thinking it would be nice if there was a better way to implement
> the Actor model, with multiple lines of execution in the same process,

FWIW, at this point I'm a big fan of this concurrency model.  I find
it hurts my brain least. :)

> yet avoiding contention from the GIL. This implies a separate GIL for
> each line of execution (to eliminate contention) and a controlled way to
> exchange data between different lines of execution.
>
> So I was thinking of proposing a design for implementing such a system.
> Or at least get interested parties thinking about such a system.
>
> With some additional research I notice that [PEP 554] (“Multiple
> subinterpeters in the stdlib”) appears to be putting forward a design
> similar to the one I described. I notice however it mentions that
> subinterpreters currently share the GIL, which would seem to make them
> unusable for parallel scenarios due to GIL contention.

I'm glad you found PEP 554.  I wanted to keep the PEP focused on
exposing the existing subinterpreter support (and the basic,
CSP-inspired concurrency model), which is why it doesn't go into much
detail about changes to the CPython runtime that will allow GIL-free
multi-core parallelism.   As Nick mentioned, my talk at the language
summit covers my plans.

Improving Python's multi-core story has been the major focus of my
(sadly relatively small) contributions to CPython for several years
now.  I've made slow progress due to limited time, but things are
picking up, especially since I got a job in December at Microsoft that
allows me to work on CPython for part of each week.  On top of that,
several other people are directly helping now (including Emily
Morehouse) and I got a lot of positive feedback for the project at
PyCon this year.

> I'd like to solicit some feedback on what might be the most efficient
> way to make forward progress on efficient parallelization in Python
> inside the same OS process. The most promising areas appear to be:
>
> 1. Make the current subinterpreter implementation in Python have more
> complete isolation, sharing almost no state between subinterpreters. In
> particular not sharing the GIL. The "Interpreter Isolation" section of
> PEP 554 enumerates areas that are currently shared, some of which
> probably shouldn't be.

Right, this is the approach I'm driving.  At this point I have the
project broken down pretty well into manageable chunks.  You're
welcome to join in. :)  Regardless,  I'd be glad to discuss it with
you in more depth if you're interested.

> 2. Give up on making things work inside the same OS process and rather
> focus on implementing better abstractions on top of the existing
> multiprocessing API so that the actor model is easier to program
> against. For example, providing some notion of Channels to communicate
> between lines of execution, a way to monitor the number of Messages
> waiting in each channel for throughput profiling and diagnostics,
> Supervision, etc. In particular I could do this by using an existing
> library like Pykka or Thespian and extending it where necessary.

It may worth a shot.  You should ask Davin Potts (CC'ed) about this.
We discussed this a little at PyCon.  I'm sure he'd welcome help in
improving the multiprocessing module.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Greg Ewing

Danilo J. S. Bellini wrote:

If we have a fast and exact primality test up to 2**64,
a signature like this would help:

def is_prime(n, probabilistic=False):


Two separate functions would be better, by the "no constant
boolean parameters" rule.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Greg Ewing

Stephan Houben wrote:
Should we call sort then probable_sort, since the non-zero probability 
exists of it going wrong due to a stray cosmic ray?


And in the interests of full disclosure, we should call Python
a probable language, running on a probable computer.

--
Probable Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: grouping / dict of lists

2018-07-13 Thread Chris Barker via Python-ideas
On Fri, Jul 13, 2018 at 12:38 PM, Michael Selik  wrote:

> Thanks for linking to these.
>

yup -- real use cases are really helpful.

Though the other paradigm for grouping is use of setdefault() rather than
defaultdict. So it would be nice to look for those, too.


> I looked at many of them in my own research, but for some reason didn't
> think to write down the links. I'll respond to each one separately.
>


> Throughout, I'm going to use my proposed ``grouped`` builtin to
> demonstrate possible revisions. Note that I am *not* suggesting a
> replacement to defaultdict. The proposal is to make a common task easier
> and more reliable. It does not satisfy all uses of defaultdict.
>

agreed -- and it shouldn't.

I"d like to see how some of these pan our with my proposed API:

either a Grouped class, or at least (key, value) iterables and/or a value
function.

I don't have time now to do them all, but for the moment:

I noticed recently that *all* examples for collection.defaultdict (
>> https://docs.python.org/3.7/library/collections.html#
>> collections.defaultdict) are cases of grouping (for an int, a list and a
>> set) from an iterator with a key, value output.
>>
>
and yet others on this thread think a (key, value) input would be rare -- I
guess it depends on whether you are thinking dict-like already


>
>> https://frama.link/o3Hb3-4U,
>>
>
> accum = defaultdict(list)
> garbageitems = []
>
> for item in root:
> filename = findfile(opts.fileroot, item.attrib['classname'])
> accum[filename].append(float(item.attrib['time']))
> if filename is None:
> garbageitems.append(item)
>
>
> This might be more clear if separated into two parts.
>
> def keyfunc(item):
> return findfile(opts.fileroot, item.attrib['classname'])
> groups = grouped(root, keyfunc)
> groups = {k: [float(v.attrib['time']) for v in g] for k, g in
> groups.items()}
> garbage = groups.pop(None, [])
>

so this one is a prime case for a value function -- I think post-processing
the groups is a pretty common case -- why make people post-process it?

def keyfunc(item):
return findfile(opts.fileroot, item.attrib['classname'])
   def valuefunc(item):
float(item.attrib['time'])
groups = grouped(root, keyfunc, valuefunc)
garbage = groups.pop(None, [])

And the post-processing is then mixing comprehension style with key
function style (what to call that -- "functional" style?), so why not use a
(key, value) iterable:

groups = grouped((findfile(opts.fileroot, item.attrib['classname']),
  item.attrib['time'])
  for item in root))

OK -- that's packing a bit too much into a line, so how about:

def keyfunc(item):
return findfile(opts.fileroot, item.attrib['classname'])

groups = grouped( (keyfunc(item), item.attrib['time']) for item in root)

>
> self.mapping = collections.defaultdict(set)
> for op in (op for op in graph.get_operations()):
>   if op.name.startswith(common.SKIPPED_PREFIXES):
> continue
>   for op_input in op.inputs:
> self.mapping[op_input].add(op)
>
>
> This is a case of a single element being added to multiple groups, which
> is your section B, below. The loop and filter could be better. It looks
> like someone intended to convert if/continue to a comprehension, but
> stopped partway through the revision.
>

yeah, this is weird --

But it does make a case for having a class with the option f using a set to
collect (which I have in an older commit of my prototype:

inputs = ((op_input, op) for op in ops for op_input in op.inputs)
groups = Grouping(inputs, key=itemgetter(0), collection=set)

otherwise, you could have a method to do it:
groups.map_on_groups(set)

(not sure I like that method name, but I hope you get the idea)

OK, back to work.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] grouping / dict of lists

2018-07-13 Thread Michael Selik
On Mon, Jul 2, 2018 at 8:49 AM Chris Barker  wrote:

> On Fri, Jun 29, 2018 at 11:25 PM, Guido van Rossum 
> wrote:
>
>
>> Hm, this actually feels heavier to me. But then again I never liked or
>> understood the need for Counter --
>>
>
> actually, me neither -- and partly because it's too lightweight -- that
> is, it's still a regular dict, and you pretty much have to know that to use
> it. That it, it provides a nice counting constructor, but after that, it's
> just a key:integer dict :-)
>

Counter provides ``most_common`` which is often implemented inefficiently
if written from scratch. People mistakenly use ``sorted`` instead of
``heapq.nlargest``.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: grouping / dict of lists

2018-07-13 Thread Michael Selik
Thanks for linking to these. I looked at many of them in my own research,
but for some reason didn't think to write down the links. I'll respond to
each one separately.

Throughout, I'm going to use my proposed ``grouped`` builtin to demonstrate
possible revisions. Note that I am *not* suggesting a replacement to
defaultdict. The proposal is to make a common task easier and more
reliable. It does not satisfy all uses of defaultdict.

https://github.com/selik/peps/blob/master/pep-.rst


On Fri, Jul 13, 2018 at 6:17 AM Nicolas Rolin 
wrote:

> I noticed recently that *all* examples for collection.defaultdict (
> https://docs.python.org/3.7/library/collections.html#collections.defaultdict)
> are cases of grouping (for an int, a list and a set) from an iterator with
> a key, value output.
>
> I wondered how common those constructions were, and what are defaultdict
> used for else. So I took a little dive into a few libs to see it (std lib,
> pypy, pandas, tensorflow, ..), and I saw essentially :
>


> A) basic cases of "grouping" with a simple for loop and a
> default_dict[key].append(value). I saw many kind of default factory
> utilized, with list, int, set, dict, and even defaultdict(list). ex :
> https://frama.link/UtNqvpvb,
>

facesizes = collections.defaultdict(int)
for face in cubofaces:
facesizes[len(face)] += 1

This is more specifically a Counter. It might look better as:

facesizes = collections.Counter(len(face) for face in cubofaces)



https://frama.link/o3Hb3-4U,
>

accum = defaultdict(list)
garbageitems = []

for item in root:
filename = findfile(opts.fileroot, item.attrib['classname'])
accum[filename].append(float(item.attrib['time']))
if filename is None:
garbageitems.append(item)


This might be more clear if separated into two parts.

def keyfunc(item):
return findfile(opts.fileroot, item.attrib['classname'])
groups = grouped(root, keyfunc)
groups = {k: [float(v.attrib['time']) for v in g] for k, g in
groups.items()}
garbage = groups.pop(None, [])


Interestingly, this code later makes a distinction between ``garbage`` and
``garbageitems`` even though they appear to be the same. The programmer
might not have made that error (?) if using the ``grouped`` function,
because the act of grouping becomes single-purpose.




> https://frama.link/dw92yJ1q
>

In this case, the grouping is being performed by Pandas, not by the
defaultdict.

expected = defaultdict(dict)
for n1, gp1 in data.groupby('A'):
for n2, gp2 in gp1.groupby('B'):
expected[n1][n2] = op(gp2.loc[:, ['C', 'D']])


However, this is still relevant to our discussion, because the defaultdict
had to be converted to a regular dict afterward. Presumably to ensure that
missing keys cause KeyErrors.

expected = dict((k, DataFrame(v))
for k, v in compat.iteritems(expected))


If you felt like one-lining this, it might look like

expected = {n1: DataFrame({n2: op(gp2.loc[:, ['C', 'D']]) for n2, gp2
in gp1}) for n1, gp1 in data.groupby('A')}

I'm not saying that's more clear, but the fact that's possible emphasizes
that the grouping is performed by Pandas. Grouping cannot be one-lined if
restricted to Python standard library.




> https://frama.link/1Gqoa7WM
>

results = defaultdict(dict)
for i, k1 in enumerate(arg1.columns):
for j, k2 in enumerate(arg2.columns):
if j < i and arg2 is arg1:
# Symmetric case
results[i][j] = results[j][i]
else:
results[i][j] = f(*_prep_binary(arg1.iloc[:, i],
arg2.iloc[:, j]))


The nested loop looks like a cross-product. It might be clearer using
itertools, but I wouldn't use ``grouped`` here.

cross = itertools.product(enumerate(arg1.columns),
enumerate(arg2.columns))




> https://frama.link/bWswbHsU,
>

self.mapping = collections.defaultdict(set)
for op in (op for op in graph.get_operations()):
  if op.name.startswith(common.SKIPPED_PREFIXES):
continue
  for op_input in op.inputs:
self.mapping[op_input].add(op)


This is a case of a single element being added to multiple groups, which is
your section B, below. The loop and filter could be better. It looks like
someone intended to convert if/continue to a comprehension, but stopped
partway through the revision.

ops = (op for op in graph.get_operations() if not
op.name.startswith(common.SKIPPED_PREFIXES))


I'm not sure how I like it, but here's a revision to use ``grouped``:

inputs = ((op_input, op) for op in ops for op_input in op.inputs)
groups = grouped(inputs, key=itemgetter(0))
groups = {k: set(v for _, v in g) for k, g in groups.items()}


https://frama.link/SZh2q8pS
>

handler_results = collections.defaultdict(list)
for handler in per_handler_ops.keys():
for op in per_handler_ops[handler]:

Re: [Python-ideas] Add the imath module

2018-07-13 Thread George Leslie-Waksman
Instead of `imath`, how about `math.integer` for the module name?

On Fri, Jul 13, 2018 at 9:20 AM Tim Peters  wrote:

> [Steven D'Aprano]
>>  > about 4.7 seconds to test 2**800 + 1;
>>
>> [Jeroen Demeyer]
>>
>>> In SageMath:
>>>
>>> sage: n = 2**800+1; timeit('is_prime(n)')
>>> 625 loops, best of 3: 303 µs per loop
>>>
>>> That's 4 orders of magnitude faster...
>>
>>
> [Tim]
>
>> More like 6, yes?
>>
>
> Heh - sorry about that.  A speck of dirt on my monitor made me read '303"
> as "3.03".
>
>
>>   My Python version is more like 3, but is way fast enough for most
>> things I do:
>>
>
> So my Python Miller-Rabin code (already shared) was about one order of
> magnitude slower.
>  ...
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Tim Peters
>
> [Steven D'Aprano]
>  > about 4.7 seconds to test 2**800 + 1;
>
> [Jeroen Demeyer]
>
>> In SageMath:
>>
>> sage: n = 2**800+1; timeit('is_prime(n)')
>> 625 loops, best of 3: 303 µs per loop
>>
>> That's 4 orders of magnitude faster...
>
>
[Tim]

> More like 6, yes?
>

Heh - sorry about that.  A speck of dirt on my monitor made me read '303"
as "3.03".


>   My Python version is more like 3, but is way fast enough for most things
> I do:
>

So my Python Miller-Rabin code (already shared) was about one order of
magnitude slower.
 ...
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Danilo J. S. Bellini
If we have a fast and exact primality test up to 2**64,
a signature like this would help:

def is_prime(n, probabilistic=False):

When probabilistic is False and the number is beyond 2**64,
it'll return OverflowError.
When probabilistic is True, it's unbounded, but probabilistic.
The docs should be explicit about that, though.

Instead of an OverflowError,
a number beyond 2**64 might simply use a slower strategy.
But to make it an explicit option, the signature could be:

def is_prime(n, exact=True, unbounded=False):

By tellling it should be both exact and unbounded,
the user is aware that it might be slow.

The alternatives are:

   - exact, unbounded: Slower test
   - not exact, unbounded: Probabilistic test
   - exact, bounded: Default, raises OverflowError beyond 2**64
   - not exact, bounded: Invalid, but it can just ignore exact input

-- 
Danilo J. S. Bellini
---
"*It is not our business to set up prohibitions, but to arrive at
conventions.*" (R. Carnap)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Tim Peters
>
> [Steven D'Aprano]
 > about 4.7 seconds to test 2**800 + 1;

[Jeroen Demeyer]

> In SageMath:
>
> sage: n = 2**800+1; timeit('is_prime(n)')
> 625 loops, best of 3: 303 µs per loop
>
> That's 4 orders of magnitude faster...
>

More like 6, yes?  My Python version is more like 3, but is way fast enough
for most things I do:

$ py -m timeit -s "from temp3 import pp; n=2**800+1" "pp(n)"
200 loops, best of 5: 1.71 msec per loop

$ py -m timeit -s "from temp3 import pp; n=2**900+1" "pp(n)"
100 loops, best of 5: 2.32 msec per loop

$ py -m timeit -s "from temp3 import pp; n=2**1000+1" "pp(n)"
100 loops, best of 5: 3.15 msec per loop

Steven's numbers are pretty baffling to me, since these are all composite
and so iterating Miller-Rabin "should get out" pretty fast:

about 4.7 seconds to test 2**800 + 1;
> less than a tenth of a millisecond to test 2**900 + 1;
> and about 8.6 seconds to test 2**1000 + 1.


Here's the Python code I used:

def pp(n, k=10):
"""
Return True iff n is a probable prime, using k Miller-Rabin tests.

When False, n is definitely composite.
"""
from random import randrange

if n < 4:
return 2 <= n <= 3
d = nm1 = n-1
s = 0
while d & 1 == 0:
s += 1
d >>= 1
if s == 0:
return False # n >= 4 and even
ss = range(1, s)
for _ in range(k):
a = randrange(2, nm1)
x = pow(a, d, n)
if x == 1 or x == nm1:
continue
for _ in ss:
x = x*x % n
if x == nm1:
break
if x == 1:
return False
else:
return False
return True
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Chris Angelico
On Sat, Jul 14, 2018 at 1:05 AM, Steven D'Aprano  wrote:
> On Fri, Jul 13, 2018 at 02:51:18PM +0200, Jacco van Dorp wrote:
>
>> I think the extra 9 characters of "_probable" are worth the extra accuracy.
>>
>> Dont teach people to lie in their function names. It's a bad example.
>> and if you really want to lie, just
>
> In what way is
>
> is_probable_prime(11)  # returns True
>
> less true than
>
> isprime(11)  # returns True
>
> If you want to *lie*, then claiming that 11 is a probable prime is
> certainly a lie.
>

It's a probable lie?

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Steven D'Aprano
On Fri, Jul 13, 2018 at 02:51:18PM +0200, Jacco van Dorp wrote:

> I think the extra 9 characters of "_probable" are worth the extra accuracy.
>
> Dont teach people to lie in their function names. It's a bad example.
> and if you really want to lie, just

In what way is 

is_probable_prime(11)  # returns True

less true than

isprime(11)  # returns True

If you want to *lie*, then claiming that 11 is a probable prime is 
certainly a lie.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: grouping / dict of lists

2018-07-13 Thread Nicolas Rolin
I noticed recently that *all* examples for collection.defaultdict (
https://docs.python.org/3.7/library/collections.html#collections.defaultdict)
are cases of grouping (for an int, a list and a set) from an iterator with
a key, value output.

I wondered how common those constructions were, and what are defaultdict
used for else. So I took a little dive into a few libs to see it (std lib,
pypy, pandas, tensorflow, ..), and I saw essentially :
A) basic cases of "grouping" with a simple for loop and a
default_dict[key].append(value). I saw many kind of default factory
utilized, with list, int, set, dict, and even defaultdict(list). ex :
https://frama.link/UtNqvpvb, https://frama.link/o3Hb3-4U,
https://frama.link/dw92yJ1q, https://frama.link/1Gqoa7WM,
https://frama.link/bWswbHsU, https://frama.link/SZh2q8pS
B) cases of grouping, but where the for loop used was alimenting more than
one "grouper". pretty annoying if we want to group something. ex:
https://frama.link/Db-Ny49a, https://frama.link/bZakUR33,
https://frama.link/MwJFqh5o,
C) classes attributes initialization (grouping is done by repeatably
calling a function, so any grouping constructor will be useless here). ex :
https://frama.link/GoGWuQwR, https://frama.link/BugcS8wU
D) Sometimes you just want to defautdict inside a defauldict inside a dict
and just have fun : https://frama.link/asBNLr1g,
https://frama.link/8j7gzfA5

>From what I saw, the most useful would be to add method to a defaultdict to
fill it from an iterable, and using a grouping method adapted to the
default_factor (so __add__ for list, int and str, add for set, update for
dict and proably __add__ for anything else)

A sample code would be :

from collections import defaultdict
class groupingdict(defaultdict):
def group_by_iterator(self, iterator):
empty_element = self.default_factory()
if hasattr(empty_element, "__add__"):
for key, element in iterator:
self[key] += element
elif hasattr(empty_element, "update"):
for key, element in iterator:
self[key].update(element)
elif hasattr(empty_element, "add"):
for key, element in iterator:
self[key].add(element)
else:
raise TypeError('default factory does not support iteration')
return self

So that for example :
>groupingdict(dict).group_by_iterator(
(grouping_key, a_dict) for grouping_key, a_dict in [
(1, {'a': 'c'}),
(1, {'b': 'f'}),
(1, {'a': 'e'}),
(2, {'a': 'e'})
]
)
returns

>groupingdict(dict, {1: {'a': 'e', 'b': 'f'}, 2: {'a': 'e'}})


My implementation is garbage and There should be 2 method, one returning
the object and one modifing it, but I think it gives more leeway than just
a function returning a dict


2018-07-13 7:11 GMT+02:00 Chris Barker via Python-ideas <
python-ideas@python.org>:

> On Mon, Jul 9, 2018 at 5:55 PM, Franklin? Lee <
> leewangzhong+pyt...@gmail.com> wrote:
>
>> >> - The storage container.
>> >
>> >
>> > so this means you'r passing in a full set of storage containers? I'm a
>> vit
>> > confused by that -- if they might be pre-populated, then they would
>> need to
>> > be instance,s an you'd need to have one for every key -- how would you
>> know
>> > in advance aht you needed???
>>
>> No, I mean the mapping (outer) container. For example, I can pass in
>> an empty OrderedDict, or a dict that already contained some groups
>> from a previous call to the grouping function.
>>
>
> Sure -- that's what my prototype does if you pass a Mapping in (or use
> .update() )
>
> why not?
>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>


-- 

--
*Nicolas Rolin* | Data Scientist
+ 33 631992617 - nicolas.ro...@tiime.fr 


*15 rue Auber, **75009 Paris*
*www.tiime.fr *
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread David Mertz
There is a VERY slow certain test for primality: divide by every candidate
factor (up to sqrt(N)). This is certainly not what we want for very large
numbers.

But the uncertainty of the probabilistic tests is very small. Steven
likened the odds of the test being wrong to "less than the chance of a
cosmic ray hitting a memory cell and altering the result." In any case, an
amateur programmer could write a tight loop testing random large numbers
and leave it running for the rest of their life without ever encountering a
false positive. Number theorists have constructed these false positives...
But if you know enough to do that construction, this isn't going to confuse
you.

On Fri, Jul 13, 2018, 4:44 AM Jacco van Dorp  wrote:

> If there's going to be failure possible primality tests, there should
> also exist certain ones. No matter how slow it can be to do it.
>
> Python if often a language for beginners. I think it'd be confusing to
> a lot of people that there's going to be tests that show "This one is
> probably prime", instead of sure tests.
>
> I'd call the functions is_prime() and is_likely_prime() (or
> is_likely_not_prime(), depending on where it fails), to make it really
> clear which you're picking and avoid confusion.
>
> 2018-07-13 10:21 GMT+02:00 Jeroen Demeyer :
> > On 2018-07-13 04:02, Chris Angelico wrote:
> >>
> >> Haha. Okay. I'm not familiar with every possible primality test, so I
> >> had no idea that they ALL have the same failure mode :)
> >
> >
> > There exist guaranteed primality tests, but those are much slower and
> more
> > complicated. The state of the art is ECPP:
> >
> > https://en.wikipedia.org/wiki/Elliptic_curve_primality
> >
> > ___
> > Python-ideas mailing list
> > Python-ideas@python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Stephan Houben
Should we call sort then probable_sort, since the non-zero probability
exists of it going wrong due to a stray cosmic ray?

It's pointless to worry about failure modes which are order of magnitudes
unlikelier than hardware failure.

Stephan

Op vr 13 jul. 2018 14:45 schreef Jeroen Demeyer :

> On 2018-07-13 14:11, Steven D'Aprano wrote:
> > What it *actually* does is:
> >
> >
> is_almost_certainly_prime_except_for_a_ludicrously_microscopic_chance_of_error_thousands_of_times_less_likely_than_a_stray_cosmic_ray_flipping_a_bit_in_memory_and_causing_the_wrong_result_to_be_returned()
>
> That's just a long variant of is_probable_prime()
>
> > If your bank is satisfied with "mere probable prime number" to transfer
> > billions of dollars around the world, then I'm sure that the users of
> > Python's std library should be too.
>
> That's besides the point. I agree that probable primes are good enough,
> just don't call them "prime".
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Jacco van Dorp
I think the extra 9 characters of "_probable" are worth the extra accuracy.

Dont teach people to lie in their function names. It's a bad example.
and if you really want to lie, just

from imath import is_probably_prime as is_prime

2018-07-13 14:44 GMT+02:00 Jeroen Demeyer :
> On 2018-07-13 14:11, Steven D'Aprano wrote:
>>
>> What it *actually* does is:
>>
>>
>> is_almost_certainly_prime_except_for_a_ludicrously_microscopic_chance_of_error_thousands_of_times_less_likely_than_a_stray_cosmic_ray_flipping_a_bit_in_memory_and_causing_the_wrong_result_to_be_returned()
>
>
> That's just a long variant of is_probable_prime()
>
>> If your bank is satisfied with "mere probable prime number" to transfer
>> billions of dollars around the world, then I'm sure that the users of
>> Python's std library should be too.
>
>
> That's besides the point. I agree that probable primes are good enough, just
> don't call them "prime".
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Jeroen Demeyer

On 2018-07-13 14:11, Steven D'Aprano wrote:

What it *actually* does is:

is_almost_certainly_prime_except_for_a_ludicrously_microscopic_chance_of_error_thousands_of_times_less_likely_than_a_stray_cosmic_ray_flipping_a_bit_in_memory_and_causing_the_wrong_result_to_be_returned()


That's just a long variant of is_probable_prime()


If your bank is satisfied with "mere probable prime number" to transfer
billions of dollars around the world, then I'm sure that the users of
Python's std library should be too.


That's besides the point. I agree that probable primes are good enough, 
just don't call them "prime".

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Jeroen Demeyer

On 2018-07-13 14:26, Steven D'Aprano wrote:

I'm pretty sure that Sage is using numpy


This computation uses PARI/GP, a dedicated library for number theory 
(https://pari.math.u-bordeaux.fr/) with a Python interface 
(https://github.com/defeo/cypari2).



And it's
probably making far less conservative choices about the rate of false
positives it is willing to accept.


It's an actual primality test, not a probable primality test. But for 
numbers which are not prime (like your examples), the difference doesn't 
actually matter.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Steven D'Aprano
On Fri, Jul 13, 2018 at 10:10:00AM +0200, Jeroen Demeyer wrote:
> On 2018-07-12 20:11, Steven D'Aprano wrote:
> >about 4.7 seconds to test 2**800 + 1;
> 
> In SageMath:
> 
> sage: n = 2**800+1; timeit('is_prime(n)')
> 625 loops, best of 3: 303 µs per loop
> 
> That's 4 orders of magnitude faster...

Go on, rub it in why don't you...

I'm pretty sure that Sage is using numpy, which would almost certainly 
have a C-based implementation, not pure Python like mine. And it's 
probably making far less conservative choices about the rate of false 
positives it is willing to accept.

And your computer is probably newer and faster than mine :-(

My point was that for casual use, even the bad cases are still fast 
enough for interactive use. If you think 5 seconds is ridiculously slow, 
you've never tried a naive trial division algorithm that took literally 
days to crawl through the millions of divisions needed.

*wink*



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Steven D'Aprano
On Fri, Jul 13, 2018 at 01:36:55PM +0200, Jeroen Demeyer wrote:
> On 2018-07-13 11:50, Jacco van Dorp wrote:
> >At the very least, use is_likely_prime() as function name.
> 
> I agree with you here: the function name should reflect what it actually 
> does.

What it *actually* does is:

is_almost_certainly_prime_except_for_a_ludicrously_microscopic_chance_of_error_thousands_of_times_less_likely_than_a_stray_cosmic_ray_flipping_a_bit_in_memory_and_causing_the_wrong_result_to_be_returned()

*At most*, I would agree that the isprime() function could raise a 
"Probably Prime" warning when necessary, with the warning disabled by 
default. Those that care about this sort of thing can enable warnings, 
those that don't shouldn't have to read scary warnings that have no 
practical meaning.

If your bank is satisfied with "mere probable prime number" to transfer 
billions of dollars around the world, then I'm sure that the users of 
Python's std library should be too.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Steven D'Aprano
On Fri, Jul 13, 2018 at 10:43:12AM +0200, Jacco van Dorp wrote:
> If there's going to be failure possible primality tests, there should
> also exist certain ones. No matter how slow it can be to do it.
>
> Python if often a language for beginners. I think it'd be confusing to
> a lot of people that there's going to be tests that show "This one is
> probably prime", instead of sure tests.

I think that it is precisely because this will be used by beginners that 
we shouldn't offer such a choice. Its our job to curate the APIs in the 
std lib, not to burden inexpert users with excess choices that will only 
worry them.

If this were a third-party module, I would say, "the more the merrier". 
Let it provide a dozen different implementations, that's cool. But the 
stdlib is batteries, not nuclear reactors. If you want a choice of 
algorithms, or the ability to configure space versus time versus 
guarantees, I think you ought to be using a 3rd party module, not the 
std library.


I'm not saying hide the fact that it is probabilistic, but that's an 
implementation detail that *in practice* makes no difference at all. 
Using Miller-Rabin and 30 randomly chosen witnesses, the probability of 
a wrong answer is < 0.01. If we tested a million numbers 
a second, it would take an average of 18,000+ years to come across a 
single such error.

In comparison, your computer probably experiences something of the order 
of 1 cosmic-ray induced bit-flip causing data corruption per week.

https://stackoverflow.com/questions/2580933/cosmic-rays-what-is-the-probability-they-will-affect-a-program

Since using probabilistic methods is good enough for PGP and other 
high-security protocols, it's surely good enough for Tom Schoolboy 
testing whether 1234567890123456789 is prime.

(For the record: it isn't.)


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Jeroen Demeyer

On 2018-07-13 11:50, Jacco van Dorp wrote:

At the very least, use is_likely_prime() as function name.


I agree with you here: the function name should reflect what it actually 
does. (Note that the technical term is "probable prime", so it should be 
is_probable_prime)


But I don't think that the Python stdlib should have a proven is_prime() 
function because that would be complicated, slow and have very limited 
benefit over is_probable_prime().

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Jacco van Dorp
2018-07-13 11:13 GMT+02:00 Jeroen Demeyer :
> On 2018-07-13 10:43, Jacco van Dorp wrote:
>>
>> If there's going to be failure possible primality tests, there should
>> also exist certain ones. No matter how slow it can be to do it.
>
>
> A certain primality test would be too specialized for the Python stdlib. If
> you really need that (which you typically don't), there are existing number
> theory packages.

You forgot to include my reasons for this:

>> Python if often a language for beginners. I think it'd be confusing to
>> a lot of people that there's going to be tests that show "This one is
>> probably prime", instead of sure tests.

Depending on function names, of course. It needn't be the fastest
available, but its what people would expect, especially those with
less experience. Also, there's existing number theory packages for
everything suggested here, so that's not an argument.

At the very least, use is_likely_prime() as function name. Otherwise,
the name of the function is just plain wrong. Don't claim accuracy you
don't have.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Jeroen Demeyer

On 2018-07-13 10:43, Jacco van Dorp wrote:

If there's going to be failure possible primality tests, there should
also exist certain ones. No matter how slow it can be to do it.


A certain primality test would be too specialized for the Python stdlib. 
If you really need that (which you typically don't), there are existing 
number theory packages.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Jacco van Dorp
If there's going to be failure possible primality tests, there should
also exist certain ones. No matter how slow it can be to do it.

Python if often a language for beginners. I think it'd be confusing to
a lot of people that there's going to be tests that show "This one is
probably prime", instead of sure tests.

I'd call the functions is_prime() and is_likely_prime() (or
is_likely_not_prime(), depending on where it fails), to make it really
clear which you're picking and avoid confusion.

2018-07-13 10:21 GMT+02:00 Jeroen Demeyer :
> On 2018-07-13 04:02, Chris Angelico wrote:
>>
>> Haha. Okay. I'm not familiar with every possible primality test, so I
>> had no idea that they ALL have the same failure mode :)
>
>
> There exist guaranteed primality tests, but those are much slower and more
> complicated. The state of the art is ECPP:
>
> https://en.wikipedia.org/wiki/Elliptic_curve_primality
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Jeroen Demeyer

On 2018-07-13 04:02, Chris Angelico wrote:

Haha. Okay. I'm not familiar with every possible primality test, so I
had no idea that they ALL have the same failure mode :)


There exist guaranteed primality tests, but those are much slower and 
more complicated. The state of the art is ECPP:


https://en.wikipedia.org/wiki/Elliptic_curve_primality
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add the imath module

2018-07-13 Thread Jeroen Demeyer

On 2018-07-12 20:11, Steven D'Aprano wrote:

about 4.7 seconds to test 2**800 + 1;


In SageMath:

sage: n = 2**800+1; timeit('is_prime(n)')
625 loops, best of 3: 303 µs per loop

That's 4 orders of magnitude faster...
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/