[issue41311] Add a function to get a random sample from an iterable (reservoir sampling)

2021-07-02 Thread Oscar Benjamin
Oscar Benjamin added the comment: I was contacted by someone interested in this so I've posted the last version above as a GitHub gist under the MIT license: https://gist.github.com/oscarbenjamin/4c1b977181f34414a425f68589e895d1 -- ___ Python

[issue43602] Include Decimal's in numbers.Real

2021-04-16 Thread Oscar Benjamin
Oscar Benjamin added the comment: I've never found numbers.Real/Complex to be useful. The purpose of the ABCs should be that they enable you to write code that works for instances of any subclass but in practice writing good floating point code requires knowing something e.g. the base

[issue41311] Add a function to get a random sample from an iterable (reservoir sampling)

2020-07-19 Thread Oscar Benjamin
Oscar Benjamin added the comment: Yeah, I guess it's a YAGNI. Thanks Raymond and Tim for looking at this! -- ___ Python tracker <https://bugs.python.org/issue41

[issue41311] Add a function to get a random sample from an iterable (reservoir sampling)

2020-07-18 Thread Oscar Benjamin
Oscar Benjamin added the comment: > Please don't get personal. Sorry, that didn't come across with the intended tone :) I agree that this could be out of scope for the random module but I wanted to make sure the reasons were considered. Reading between the lines I get the impress

[issue41311] Add a function to get a random sample from an iterable (reservoir sampling)

2020-07-17 Thread Oscar Benjamin
Oscar Benjamin added the comment: > At its heart, this a CPython optimization to take advantage of list() being > slower than a handful of islice() calls. This comment suggest that you have missed the general motivation for reservoir sampling. Of course the stdlib can not satisfy a

[issue41311] Add a function to get a random sample from an iterable (reservoir sampling)

2020-07-17 Thread Oscar Benjamin
Oscar Benjamin added the comment: All good points :) Here's an implementation with those changes and that shuffles but gives the option to preserve order. It also handles the case W=1.0 which can happen at the first step with probability 1 - (1 - 2**53)**k. Attempting to preserve order

[issue41311] Add a function to get a random sample from an iterable (reservoir sampling)

2020-07-16 Thread Oscar Benjamin
Oscar Benjamin added the comment: To be clear I suggest that this could be a separate function from the existing sample rather than a replacement or a routine used internally. The intended use-cases for the separate function are: 1. Select from something where you really do not want

[issue41311] Add a function to get a random sample from an iterable (reservoir sampling)

2020-07-15 Thread Oscar Benjamin
New submission from Oscar Benjamin : The random.choice/random.sample functions will only accept a sequence to select from. Can there be a function in the random module for selecting from an arbitrary iterable? It is possible to make an efficient function that can make random selections from

[issue20479] Efficiently support weight/frequency mappings in the statistics module

2019-01-20 Thread Oscar Benjamin
Oscar Benjamin added the comment: Sorry, sent too soon... > Matlab doesn't support even weighted mean as far as I can tell. There > is wmean on the matlab file exchange: https://stackoverflow.com/a/36464881/9450991 This is a separate function `wmean(data, weights)`. It has to be a se

[issue20479] Efficiently support weight/frequency mappings in the statistics module

2019-01-20 Thread Oscar Benjamin
Oscar Benjamin added the comment: > I would find it very helpful if somebody has time to do a survey of > other statistics libraries or languages (e.g. numpy, R, Octave, Matlab, > SAS etc) and see how they handle data with weights. Numpy has only sporadic support for this. The stan

[issue25412] __floordiv__ in module fraction fails with TypeError instead of returning NotImplemented

2015-10-16 Thread Oscar Benjamin
Oscar Benjamin added the comment: You should test the change with number types that don't use the number tower e.g. Decimal, sympy, gmpy2, mpf, numpy arrays etc. Few non stdlib types use the number ABCs so testing against numbers.Complex may cause a change in behaviour. -- nosy

[issue25355] Windows 3.5 installer does not add python to "App Paths" key

2015-10-09 Thread Oscar Benjamin
New submission from Oscar Benjamin: >From the mailing list: https://mail.python.org/pipermail/python-list/2015-October/697744.html ''' The new installer for 3.5 doesn't create an "App Paths" key for "python.exe" like the old installer used to do (see the old Too

[issue20575] Type handling policy for the statistics module

2014-02-09 Thread Oscar Benjamin
New submission from Oscar Benjamin: As of issue20481, the statistics module for Python 3.4 will disallow any mixing of numeric types with the exception of int that can mix with any other type (but only one at a time). My understanding is that this change was not necessarily considered

[issue20481] Clarify type coercion rules in statistics module

2014-02-08 Thread Oscar Benjamin
Oscar Benjamin added the comment: Close #20481: Disallow mixed type input in statistics If I understand correctly the reason for hastily pushing this patch through is that it's the safe option: disallow mixing types as a quick fix for soon to be released 3.4. If we want to allow mixing types

[issue20499] Rounding errors with statistics.variance

2014-02-07 Thread Oscar Benjamin
Oscar Benjamin added the comment: A fast Decimal.as_integer_ratio() would be useful in any case. If you're going to use decimals though then you can trap inexact and keep increasing the precision until it becomes exact. The problem is with rationals that cannot be expressed in a finite number

[issue20499] Rounding errors with statistics.variance

2014-02-06 Thread Oscar Benjamin
Changes by Oscar Benjamin oscar.j.benja...@gmail.com: -- nosy: +wolma ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20499 ___ ___ Python-bugs-list

[issue20481] Clarify type coercion rules in statistics module

2014-02-04 Thread Oscar Benjamin
Oscar Benjamin added the comment: I was working on the basis that we were talking about Python 3.5. But now I see that it's a 3.4 release blocker. Is it really that urgent? I think the current behaviour is very good at handling a wide range of types. It would be nice to consistently report

[issue20479] Efficiently support weight/frequency mappings in the statistics module

2014-02-03 Thread Oscar Benjamin
Oscar Benjamin added the comment: in my previous message. To support weights (float or Rational) this would have to be more sophisticated. I guess you'd do: for x,w in data.items(): T = _coerce_types(T, type(x)) xn, xd = exact_ratio(x) wn, wd = exact_ratio(w

[issue20481] Clarify type coercion rules in statistics module

2014-02-03 Thread Oscar Benjamin
Oscar Benjamin added the comment: It's not as simple as registering with an ABC. You also need to provide the interface that the ABC represents: import sympy r = sympy.Rational(1, 2) r 1/2 r.numerator Traceback (most recent call last): File stdin, line 1, in module AttributeError: 'Half

[issue20389] clarify meaning of xbar and mu in pvariance/variance of statistics module

2014-02-03 Thread Oscar Benjamin
Oscar Benjamin added the comment: I agree that the current wording in the doc-strings is ambiguous. It should be more careful to distinguish between mu : true/population mean xbar : estimated/sample mean I disagree that the keyword arguments should be made the same. There is an important

[issue20499] Rounding errors with statistics.variance

2014-02-03 Thread Oscar Benjamin
New submission from Oscar Benjamin: The mean/variance functions in the statistics module don't quite round correctly. The reasons for this are that although exact rational arithmetic is used internally in the _sum function it is not used throughout the module. In particular the _sum function

[issue20481] Clarify type coercion rules in statistics module

2014-02-03 Thread Oscar Benjamin
Oscar Benjamin added the comment: I agree that supporting non-stdlib types is in some ways a separate issue from how to manage coercion with mixed stdlib types. Can you provide a complete patch (e.g. hg diff coerce_types.patch). http://docs.python.org/devguide/ There should probably also

[issue20479] Efficiently support weight/frequency mappings in the statistics module

2014-02-02 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 2 February 2014 11:55, Steven D'Aprano rep...@bugs.python.org wrote: (1) separate functions, as Nick suggests: mean vs weighted_mean, stdev vs weighted_stdev This would be my preferred approach. It makes it very clear which functions are available

[issue20481] Clarify type coercion rules in statistics module

2014-02-02 Thread Oscar Benjamin
Oscar Benjamin added the comment: Wolfgang have you tested this with any third party numeric types from sympy, gmpy2, mpmath etc.? Last I checked no third party types implement the numbers ABCs e.g.: import sympy, numbers r = sympy.Rational(1, 2) r 1/2 isinstance(r, numbers.Rational) False

[issue12641] Remove -mno-cygwin from distutils

2013-10-01 Thread Oscar Benjamin
Oscar Benjamin added the comment: Thanks Antoine! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12641 ___ ___ Python-bugs-list mailing list

[issue19086] Make fsum usable incrementally.

2013-09-30 Thread Oscar Benjamin
Oscar Benjamin added the comment: I should be clearer about my intentions. I'm hoping to create an efficient and accurate sum() function for the new stdlib statistics module: http://bugs.python.org/issue18606 http://www.python.org/dev/peps/pep-0450/ The sum() function currently proposed can

[issue12641] Remove -mno-cygwin from distutils

2013-09-30 Thread Oscar Benjamin
Oscar Benjamin added the comment: Thanks for looking at this Antoine. I've attached an updated patch for Python 2.7 called check_mno_cywin_py27_2.patch. This explicitly closes the popen object in the same way as the get_versions() function immediately above. I've just signed an electronic

[issue12641] Remove -mno-cygwin from distutils

2013-09-30 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 30 September 2013 12:08, Oscar Benjamin rep...@bugs.python.org wrote: I've attached an updated patch for Python 2.7 called check_mno_cywin_py27_2.patch. To be clear: I retested this patch (using the setup described above) and the results are unchanged

[issue19086] Make fsum usable incrementally.

2013-09-30 Thread Oscar Benjamin
Oscar Benjamin added the comment: Fair enough. Thanks again for taking the time to look at this. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19086

[issue19086] Make fsum usable incrementally.

2013-09-28 Thread Oscar Benjamin
Oscar Benjamin added the comment: Thanks for responding Raymond. Raymond Hettinger wrote: A start argument won't help you, because you will discard information on input. A sequence like [1E100, 0.1, -1E100, 0.1] wouldn't work when split into subtotal=fsum([1E100, 0.1]) and fsum([-1E100, 0.1

[issue19086] Make fsum usable incrementally.

2013-09-25 Thread Oscar Benjamin
New submission from Oscar Benjamin: I would like to be able use fsum incrementally however it is not currently possible. With sum() you can do: subtotal = sum(nums) subtotal = sum(othernums, subtotal) This wouldn't work for fsum() because the returned float is not the same as the state

[issue18821] Add .lastitem attribute to takewhile instances

2013-09-08 Thread Oscar Benjamin
Oscar Benjamin added the comment: Thank you Claudiu very much for writing a patch; I was expecting to have to do that myself! Serhiy, you're right groupby is a better fit for this. It does mean a bit of reworking for the (more complicated) sum function I'm working on but I've just checked

[issue18606] Add statistics module to standard library

2013-08-27 Thread Oscar Benjamin
Oscar Benjamin added the comment: On Aug 28, 2013 1:43 AM, janzert rep...@bugs.python.org wrote: Seems that the discussion is now down to implementation issues and the PEP is at the point of needing to ask python-dev for a PEP dictator? I would say so. AFAICT Steven has addressed all

[issue18821] Add .lastitem attribute to takewhile instances

2013-08-23 Thread Oscar Benjamin
New submission from Oscar Benjamin: I've often wanted to be able to query a takewhile object to discover the item that failed the predicate but the item is currently discarded. A usage example: def sum(items): it = iter(items) ints = takewhile(Integral.__instancecheck__

[issue18606] Add statistics module to standard library

2013-08-22 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 22 August 2013 03:43, Steven D'Aprano rep...@bugs.python.org wrote: If Oscar is willing, I'd like to discuss some of his ideas off-list, but that may take some time. I am willing and it will take time. I've started reading the paper that Raymond

[issue12641] Remove -mno-cygwin from distutils

2013-08-21 Thread Oscar Benjamin
Oscar Benjamin added the comment: I just noticed today that the fix that implemented by these patches (only providing -mno-cygwin if gcc_ver 4) is also used by numpy's distutils. You can see the relevant code here: https://github.com/numpy/numpy/blob/master/numpy/distutils/mingw32ccompiler.py

[issue18606] Add statistics module to standard library

2013-08-19 Thread Oscar Benjamin
Oscar Benjamin added the comment: I've just checked over the new patch and it all looks good to me apart from one quibble. It is documented that statistics.sum() will respect rounding errors due to decimal context (returning the same result that sum() would). I would prefer it if statistics.sum

[issue18606] Add statistics module to standard library

2013-08-19 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 19 August 2013 17:35, Steven D'Aprano rep...@bugs.python.org wrote: Steven D'Aprano added the comment: On 19/08/13 23:15, Oscar Benjamin wrote: The final result is not accurate to 2 d.p. rounded down. This is because the decimal context has affected

[issue18606] Add statistics module to standard library

2013-08-12 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 12 August 2013 20:20, Steven D'Aprano rep...@bugs.python.org wrote: On 12/08/13 19:21, Mark Dickinson wrote: About the implementation of sum: add_partial is no longer documented as a public function, so I'm open to switching algorithms in the future

[issue18606] Add statistics module to standard library

2013-08-09 Thread Oscar Benjamin
Oscar Benjamin added the comment: One small point: I think that the argument `m` to variance, pvariance, stdev and pstdev should be renamed to `mu` for pvariance/pstdev and `xbar` for variance/stdev. The doc-strings should carefully distinguish that `mu` is the true/population mean and `xbar

[issue18606] Add statistics module to standard library

2013-08-06 Thread Oscar Benjamin
Changes by Oscar Benjamin oscar.j.benja...@gmail.com: -- nosy: +oscarbenjamin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18606 ___ ___ Python

[issue18305] [patch] Fast sum() for non-numbers

2013-07-11 Thread Oscar Benjamin
Oscar Benjamin added the comment: This optimisation is a semantic change. It breaks backward compatibility in cases where a = a + b and a += b do not result in the name a having the same value. In particular this breaks backward compatibility for numpy users. Numpy arrays treat += differently

[issue12641] Remove -mno-cygwin from distutils

2013-07-11 Thread Oscar Benjamin
Oscar Benjamin added the comment: I'm attaching three new patches following on from Eric and Christian's suggestions: check_mno_cywin_py27_1.patch (for Python 2.7) check_mno_cywin_py3_1.patch (for Python 3.2 and 3.3) check_mno_cywin_py34_1.patch (for Python 3.4) The py27 patch now uses

[issue12641] Remove -mno-cygwin from distutils

2013-07-09 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 9 July 2013 16:25, Christian Heimes rep...@bugs.python.org wrote: The is_cygwingcc() function can be simplified a lot with subprocess.check_output(). My initial thought was to do that but then I based it on _find_exe_version which for whatever reason

[issue12641] Remove -mno-cygwin from distutils

2013-07-09 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 9 July 2013 17:36, Éric Araujo rep...@bugs.python.org wrote: Don’t forget that distutils is used during CPython’s build process to compile extension modules: subprocess may not be importable then. Subprocess is imported at at the top of the module in 3.x

[issue12641] Remove -mno-cygwin from distutils

2013-06-25 Thread Oscar Benjamin
Oscar Benjamin added the comment: I'm attaching one more patch check_mno_cywin_py34.patch. This is my preferred patch for Python 3.4 (default). It fixes building with MinGW and removes all support for using Cygwin gcc with --compiler=mingw32. The user would see the following error message

[issue12641] Remove -mno-cygwin from distutils

2013-06-24 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 24 June 2013 09:07, Marc-Andre Lemburg rep...@bugs.python.org wrote: Could someone perhaps produce a single final patch file which can be applied to Python 2.7 and 3.2+ ? I've attached two patches check_mno_cywin_py27.patch for Python 2.7

[issue12641] Remove -mno-cygwin from distutils

2013-06-24 Thread Oscar Benjamin
Changes by Oscar Benjamin oscar.j.benja...@gmail.com: Added file: http://bugs.python.org/file30683/test_mno_cygwin.tar.gz ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12641

[issue12641] Remove -mno-cygwin from distutils

2013-06-24 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 24 June 2013 12:53, Oscar Benjamin rep...@bugs.python.org wrote: The changes are identical but the 2.7 patch didn't apply cleanly against 3.x. I'll upload the files used to test the patches in test_mno_cygwin.tar.gz. Correction: the patches are not quite

[issue18129] Fatal Python error: Cannot recover from stack overflow.

2013-06-03 Thread Oscar Benjamin
New submission from Oscar Benjamin: This is from a thread on python-list that started here: http://mail.python.org/pipermail/python-list/2013-May/647895.html There are situations in which the Python 3.2 and 3.3 interpreters crash with Fatal Python error: Cannot recover from stack overflow

[issue12641] Remove -mno-cygwin from distutils

2013-05-25 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 25 May 2013 04:43, Renato Silva rep...@bugs.python.org wrote: Renato Silva added the comment: Hi Oscar! Sorry, I just meant to correct this information: in gcc 4.x it produces an error preventing build. Even if it doesn't do anything useful, still GCC

[issue12641] Remove -mno-cygwin from distutils

2013-05-24 Thread Oscar Benjamin
Oscar Benjamin added the comment: Renato Silva added the comment: I must note that GCC 4.x *does* support -mno-cygwin, at least until 4.4, and at least the MinGW version. MinGW has never supported the -mno-cygwin option. It has simply tolerated it. The option never did anything useful

[issue12641] Remove -mno-cygwin from distutils

2013-05-23 Thread Oscar Benjamin
Oscar Benjamin added the comment: I have written a function that can be used to determine if the gcc that distutils will use is from Cygwin or MinGW: def is_cygwingcc(): '''Try to determine if the gcc that would be used is from cygwin.''' out = Popen(['gcc', '-dumpmachine'], shell=True

[issue12641] Remove -mno-cygwin from distutils

2013-05-22 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 22 May 2013 12:43, Martin v. Löwis rep...@bugs.python.org wrote: Am 21.05.13 23:14, schrieb Oscar Benjamin: More generally I think that compiling non-cygwin extensions with cygwin gcc should be altogether deprecated (for Python 3.4 at least). It should

[issue12641] Remove -mno-cygwin from distutils

2013-05-22 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 22 May 2013 13:40, Oscar Benjamin rep...@bugs.python.org wrote: However on further reflection I'm a little reluctant to force an error if I can't *prove* that the setup is broken. After a little more reflection I realise that we could just do

[issue12641] Remove -mno-cygwin from distutils

2013-05-21 Thread Oscar Benjamin
Oscar Benjamin added the comment: I'd really like to get a resolution on this issue so I've tried to gather some more information about this problem by asking some questions in the mingw-users mailing list. The resulting thread can be found here: http://comments.gmane.org

[issue12641] Remove -mno-cygwin from distutils

2013-05-21 Thread Oscar Benjamin
Oscar Benjamin added the comment: On 21 May 2013 17:21, Martin v. Löwis rep...@bugs.python.org wrote: C: Users who have only cygwin gcc 4.x installed For those, the current setup will produce an error message, essentially telling them that the need to fix something (specifically: edit