from:"Evgeni Burovski"

[Numpy-discussion] Re: New matvec and vecmat functions

2024-01-24 Thread Evgeni Burovski

FWIW, +1 for matvec & vecmat to complement matmat (erm, matmul). Having a
binop where one argument is a matrix and the other is a stack/batch of
vectors is indeed awkward otherwise, and a dedicated function to clearly
distinguish "two matrices" from "a matrix and a batch of vectors" sounds
great from a user perspective.

As to vecmat doing hermitian conjugation of the vector argument --- I'd be
in favor (because <\psi | \hat H | \psi>) but this is a weak preference.

ср, 24 янв. 2024 г., 21:28 Marten van Kerkwijk :

> > Why do these belong in NumPy? What is the broad field of application of
> these functions? And,
> > does a more general concept underpin them?
>
> Multiplication of a matrix with a vector is about as common as matrix
> with matrix or vector with vector, and not currently easy to do for
> stacks of vectors, so I think the case for matvec is similarly strong as
> that for matmul and vecdot.
>
> Arguably, vecmat is slightly less common, though completes the quad.
>
> -- Marten
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: We've recevied a NumFOCUS SDG!

2023-11-09 Thread Evgeni Burovski

Congratulations Kai!

чт, 9 нояб. 2023 г., 05:16 Charles R Harris :

>
>
> On Wed, Nov 8, 2023 at 4:24 PM  wrote:
>
>> Hello NumPy community,
>>
>> I am thrilled to announce that I have been awarded a NumFocus Small
>> Development Grant for my work on NumPy-Financial. This grant will enable me
>> to enhance compatibility with NumPy 2.0, attract new contributors, improve
>> testing and documentation, and perform the UFunc rework. I'm looking
>> forward to contributing to the scientific Python community!
>>
>> As part of this grant, I am looking for any NumPy-Financial users input,
>> especially in regard to setting up some benchmarks of real world scenarios.
>> If you are currently using NumPy-Financial please get in touch.
>>
>> I hope everyone is going well
>>
>> Kai
>> __
>
>
> That's great. Congratulations.
>
> Chuck
>
>>
>> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: New user dtypes and the buffer protocol

2023-07-06 Thread Evgeni Burovski

I wonder if the dlpack protocol can be helpful for these kinds of dtypes?



On Thu, Jul 6, 2023 at 7:56 PM Nathan  wrote:
>
> Hi all,
>
> As you may know, I'm currently working on a variable-width string dtype
using the new experimental user dtype API. As part of this work I'm running
into papercuts that future dtype authors will likely hit and I've been
trying to fix them as I go.
>
> One issue I'd like to raise with the list is that the Python buffer
protocol and the `__array_interface__` protocol support a limited set of
data types.
>
> This leads to three concrete issues I'm working around:
>
>* The `npy` file format uses the type strings defined by the
`__array_interface__` protocol, so any type that doesn't have a type string
defined in that protocol cannot currently be saved [1].
>
> * Cython uses the buffer protocol in its support for numpy arrays and
in the typed memoryview interface so that means any array with a dtype that
doesn't support the buffer protocol cannot be accessed using idiomatic
cython code [2]. The same issue means cython can't easily support float16
or datetime dtypes [3].
>
> * Currently new dtypes don't have a way to export a string version of
themselves that numpy can subsequently load (implicitly importing the
dtype). This makes it more awkward to update downstream libraries that
currently treat dtypes as strings.
>
> One way to fix this is to define an ad-hoc extension to the buffer
protocol. Officially, the buffer protocol only supports the format codes
used in the struct module [4]. Unofficially, memoryview doesn't raise a
NotImplementedError if you pass it an invalid format code, only raising an
error when it tries to access the data. This means we can stuff an
arbitrary string into the format code. See the proposal from Sebastian on
the Python Discuss forum [5] and his proof-of-concept [6]. The hardest
issue with this approach is that it's a social problem, requiring
cross-project coordination with at least Cython, and possibly a PEP to
standardize whatever extension to the buffer protocol we come up with.
>
> Another option would be to exchange data using the arrow data format [7],
which already supports many of the kinds of memory layouts custom dtype
authors might want to use and supports defining custom data types [8]. The
big issue here is that NumPy probably can't depend on the arrow C++ library
(I think?) so we would need to write a bunch of code to support arrow data
layouts and data types, but then we would also need to do the same thing on
the Cython side.
>
> Implementing either of these approaches fixes the issues I enumerated
above at the cost of some added complexity. We don't necessarily have to
make an immediate decision for my work to be viable, I can work around most
of these issues, but I think now is probably the time to raise this as an
issue and see if anyone has strong opinions about what NumPy should
ultimately do.
>
> I've raised this on the Cython mailing list to get their take as well [9].
>
> [1] https://github.com/numpy/numpy/issues/24110
> [2] https://github.com/numpy/numpy/issues/18442
> [3] https://github.com/numpy/numpy/issues/4983
> [4] https://docs.python.org/3/library/struct.html#format-strings
> [5]
https://discuss.python.org/t/buffer-protocol-and-arbitrary-data-types/26256
> [6] https://github.com/numpy/numpy/issues/23500#issuecomment-1525103546
> [7] https://arrow.apache.org/docs/format/Columnar.html
> [8] https://arrow.apache.org/docs/format/Columnar.html#extension-types
> [9] https://mail.python.org/pipermail/cython-devel/2023-July/005434.html
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Giving deprecation of e.g. `float(np.array([1]))` a shot (not 0-d)

2023-04-20 Thread Evgeni Burovski

If symmetry w.r.t. pytorch is any guide, it was nice to have it:

In [38]: float(torch.as_tensor([2]))
Out[38]: 2.0

In [39]: float(np.asarray([2]))
Out[39]: 2.0

I guess this boils down to what is a scalar really: is it `scalar.size
== 1` or `scalar.ndim == 0` or something else.
But that's just a digression, nevermind.


On Thu, Apr 20, 2023 at 7:25 PM Stephan Hoyer  wrote:
>
>
> On Thu, Apr 20, 2023 at 9:12 AM Sebastian Berg  
> wrote:
>>
>> Hi all,
>>
>> Unlike conversions of 0-d arrays via:
>>
>> float(np.array([1]))
>>
>> conversions of 1-D or higher dimensional arrays with a single element
>> are a bit strange:
>>
>> float(np.array([1]))
>>
>> And deprecating it has come up often enough with many in favor, but
>> also many worried about the possible annoyance to users.
>> I decided to give the PR a shot, I may have misread the room on it
>> though:
>>
>> https://github.com/numpy/numpy/pull/10615
>>
>> So if this turns out noisy (or you may simply disagree), I am happy to
>> revert!
>
>
> This looks like a great clean-up to me, thanks for giving this a try!
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Invalid value encoutered : how to prevent numpy.where to do this?

2023-02-18 Thread Evgeni Burovski

FWIW, scipy has a small utility just for this:

https://github.com/scipy/scipy/blob/main/scipy/_lib/_util.py#L36

Not to be used in performance critical loops, of course.

Cheers,

Evgeni


сб, 18 февр. 2023 г., 17:02 David Pine :

> I agree.  The problem can be avoided in a very inelegant way by turning
> warnings off before calling where() and turning them back on afterward,
> like this
>
> warnings.filterwarnings("ignore", category=RuntimeWarning)
> result = np.where(x == 0.0, 0.0, 1./data)
> warnings.filterwarnings("always", category=RuntimeWarning)
>
> But it would be MUCH nicer if there were an optional keyword argument in
> the where() call.
>
> Thanks,
> Dave
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Getting scipy.interpolate.pchip_interpolate to return the first derivative of a pchip interpolation

2023-01-22 Thread Evgeni Burovski

The PCHIP algorithm is written out in the SciPy API documentation:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.PchipInterpolator.html

In general, there is no guarantee that if you tabulate and
interpolate, the derivative matches the derivative of the underlying
function.

On Sun, Jan 22, 2023 at 12:51 PM Samuel Dupree  wrote:
>
> I believe I know what is going on, but I don't understand why.
>
> The line for the first derivative that failed to coincide with the points in 
> the plot for the cosine is actually the interpolated first derivative scaled 
> by the factor pi/180. When I multiply the interpolated values for the first 
> derivative by 180/pi,  the interpolated first derivative coincides with the 
> points for the cosine as expected.
>
> What I don't understand is how the interpolator came up with the scale factor 
> it did and applied it using pure numbers.
>
> Any thoughts?
>
> Sam Dupree.
>
>
> On 1/21/23 18:04, Samuel Dupree wrote:
>
> I'm running SciPy ver. 1.9.3 under Python ver. 3.9.15  on a Mac Pro (2019) 
> desktop running Mac OSX ver. 13.1 Ventura. The problem I'm having is getting 
> scipy.interpolate.pchip_interpolate to return the first derivative of a pchip 
> interpolation.
>
> The test program I'm using is given below (and attached to this note).
>
>
> import numpy as np
> import matplotlib.pyplot as plt
> from scipy.interpolate import pchip_interpolate
>
> x_observed  = np.linspace(0.0, 360.0, 51)
> y_observed  = np.sin(np.pi*x_observed/180)
> dydx_observed = np.cos(np.pi*x_observed/180)
>
> x= np.linspace(min(x_observed), max(x_observed), num=100)
> y= pchip_interpolate(x_observed, y_observed, x, der=0, axis=0)
> dydx = pchip_interpolate(x_observed, y_observed, x, der=1, axis=0)
>
> plt.plot(x_observed,y_observed, "bo" , label="observation funct")
> plt.plot(x_observed, dydx_observed, "rx" , label="observation deriv")
> plt.plot(x , y, "c-", label="pchip interpolation funct")
> plt.plot(x , dydx , "k-", label="pchip interpolation deriv")
> plt.legend()
> plt.savefig("pchip_example_01.png")
> plt.show()
>
>
> The program generates values of the sine function (y_observed) over the range 
> of 0 to 360 degrees. (x_observed). In a similar fashion, the cosine function 
> (first derivative of the sine function) is generated over the same range 
> (dydx_observed). pchip_interpolate is used to perform the interpolation over 
> a specified range for the function and its first derivative. A composite plot 
> is generated showing the points for the function (the sine) and its first 
> derivative (cosine). The interpolated points overlay the function (sine) as 
> expected. However, the first derivative returned fails to overlay the cosine 
> function. The plot is attached to this note.
>
> Any thoughts or suggestions?
>
> Sam Dupree
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sdup...@speakeasy.net
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: ANN: SciPy 1.10.0

2023-01-04 Thread Evgeni Burovski

 ``crosstab``, ``pointbiserialr``,
> ``spearmanr``, ``kendalltau``, and ``weightedtau`` have been renamed to
> ``statistic`` and ``pvalue`` for consistency throughout `scipy.stats`.
> Old attribute names are still allowed for backward compatibility.
>   - `scipy.stats.anderson` now returns the parameters of the fitted
> distribution in a `scipy.stats._result_classes.FitResult` object.
>   - The ``plot`` method of `scipy.stats._result_classes.FitResult` now accepts
> a ``plot_type`` parameter; the options are ``'hist'`` (histogram, 
> default),
> ``'qq'`` (Q-Q plot), ``'pp'`` (P-P plot), and ``'cdf'`` (empirical CDF
> plot).
>   - Kolmogorov-Smirnov tests (e.g. `scipy.stats.kstest`) now return the
> location (argmax) at which the statistic is calculated and the variant
> of the statistic used.
>
> - Improved the performance of several `scipy.stats` functions.
>
>   - Improved the performance of `scipy.stats.cramervonmises_2samp` and
> `scipy.stats.ks_2samp` with ``method='exact'``.
>   - Improved the performance of `scipy.stats.siegelslopes`.
>   - Improved the performance of `scipy.stats.mstats.hdquantile_sd`.
>   - Improved the performance of `scipy.stats.binned_statistic_dd` for several
> NumPy statistics, and binned statistics methods now support complex data.
>
> - Added the ``scramble`` optional argument to 
> `scipy.stats.qmc.LatinHypercube`.
>   It replaces ``centered``, which is now deprecated.
> - Added a parameter ``optimization`` to all `scipy.stats.qmc.QMCEngine`
>   subclasses to improve characteristics of the quasi-random variates.
> - Added tie correction to `scipy.stats.mood`.
> - Added tutorials for resampling methods in `scipy.stats`.
> - `scipy.stats.bootstrap`, `scipy.stats.permutation_test`, and
>   `scipy.stats.monte_carlo_test` now automatically detect whether the provided
>   ``statistic`` is vectorized, so passing the ``vectorized`` argument
>   explicitly is no longer required to take advantage of vectorized statistics.
> - Improved the speed of `scipy.stats.permutation_test` for permutation types
>   ``'samples'`` and ``'pairings'``.
> - Added ``axis``, ``nan_policy``, and masked array support to
>   `scipy.stats.jarque_bera`.
> - Added the ``nan_policy`` optional argument to `scipy.stats.rankdata`.
>
>
> 
> Deprecated features
> 
> - `scipy.misc` module and all the methods in ``misc`` are deprecated in v1.10
>   and will be completely removed in SciPy v2.0.0. Users are suggested to
>   utilize the `scipy.datasets` module instead for the dataset methods.
> - `scipy.stats.qmc.LatinHypercube` parameter ``centered`` has been deprecated.
>   It is replaced by the ``scramble`` argument for more consistency with other
>   QMC engines.
> - `scipy.interpolate.interp2d` class has been deprecated.  The docstring of 
> the
>   deprecated routine lists recommended replacements.
>
> *
> Expired Deprecations
> *
> - There is an ongoing effort to follow through on long-standing deprecations.
> - The following previously deprecated features are affected:
>
>   - Removed ``cond`` & ``rcond`` kwargs in ``linalg.pinv``
>   - Removed wrappers ``scipy.linalg.blas.{clapack, flapack}``
>   - Removed ``scipy.stats.NumericalInverseHermite`` and removed ``tol`` & 
> ``max_intervals`` kwargs from ``scipy.stats.sampling.NumericalInverseHermite``
>   - Removed ``local_search_options`` kwarg frrom 
> ``scipy.optimize.dual_annealing``.
>
>
> *
> Other changes
> *
> - `scipy.stats.bootstrap`, `scipy.stats.permutation_test`, and
>   `scipy.stats.monte_carlo_test` now automatically detect whether the provided
>   ``statistic`` is vectorized by looking for an ``axis`` parameter in the
>   signature of ``statistic``. If an ``axis`` parameter is present in
>   ``statistic`` but should not be relied on for vectorized calls, users must
>   pass option ``vectorized==False`` explicitly.
> - `scipy.stats.multivariate_normal` will now raise a ``ValueError`` when the
>   covariance matrix is not positive semidefinite, regardless of which method
>   is called.
>
> **
> Authors
> **
>
> * Name (commits)
> * h-vetinari (10)
> * Jelle Aalbers (1)
> * Oriol Abril-Pla (1) +
> * Alan-Hung (1) +
> * Tania Allard (7)
> * Oren Amsalem (1) +
> * Sven Baars (10)
> * Balthasar (1) +
> * Ross Barnowski (1)
> * Christoph Baumgarten (2)
> * Peter Bell (2)
> * Sebastian Berg (1)
> * Aaron Berk (1) +
> * boatwrong (1) +
> * boeleman (1) +
> * Jake Bowhay (50)
> * Matthew Brett (4)
> * Evgeni Burovski (93)
> * Matthias Bussonnier (6)
> * Dominic C (2)
> * M

[Numpy-discussion] Re: ANN: NumPy Fellowship Program & Sayed Adel as our first Developer in Residence

2022-12-01 Thread Evgeni Burovski

Congratulations Sayed!

On Fri, Dec 2, 2022 at 2:05 AM Brigitta Sipőcz
 wrote:
>
> Wonderful news, congratulations Sayed!
>
> Brigitta
>
> On Thu, 1 Dec 2022 at 13:18, Ralf Gommers  wrote:
>>
>> Hi all,
>>
>> I'm excited to be able to share this announcement on behalf of the NumPy 
>> Steering Council. We have created a new program, the NumPy Fellowship 
>> Program, and offered Sayed Adel the very first Developer in Residence role. 
>> Sayed starts his 1 year tenure in that role today, and we are really looking 
>> forward to him working on NumPy full-time.
>>
>> We wrote a blog post about the program, and why we offered the role to 
>> Sayed: https://blog.scientific-python.org/numpy/fellowship-program/. I've 
>> copied the blog post content at the end of this email.
>>
>> In addition, here is some more detail on NumPy project finances that didn't 
>> make it into the blog post (which is likely to have a wider audience than 
>> the readership of this mailing list), but is quite relevant to share here:
>>
>> Over the past decade, NumPy has accumulated individual donations as well as 
>> payments from Tidelift. NumPy has been a fiscally sponsored project of 
>> NumFOCUS for a decade - meaning that NumFOCUS, as a 501(c)3 nonprofit, 
>> administers funds for NumPy. As a result, NumPy has accumulated funds for a 
>> long time - and those are now transparently administered on Open Collective. 
>> There you will see a "general fund", currently with a ~$23,000 balance, and 
>> two open "projects" with committed funding - one for the active CZI grant we 
>> have, and one for this new Fellowship Program. Guidelines for using those 
>> funds are described in 
>> https://numpy.org/neps/nep-0048-spending-project-funds.html.
>>
>> Finally it is worth pointing out that we are now able to solicit donations 
>> on Open Collective, and have added contribution tiers on the front page of 
>> https://opencollective.com/numpy. Until now, we have never actively 
>> solicited donations as a project, because the accounting support and 
>> transparent financial reporting was not in place. That has changed now 
>> though, so we are hoping that with guidelines to spend funds plus a concrete 
>> fellowship program that we're expecting to be quite impactful, we are now 
>> able to confidently tell people that if they donate to NumPy, we will manage 
>> their contribution well and translate it into more time for someone on the 
>> NumPy team to make NumPy better.
>>
>> Cheers,
>> Ralf
>>
>>
>> blog post content:
>>
>> The NumPy team is excited to announce the launch of the NumPy Fellowship 
>> Program and the appointment of Sayed Adel (@seiko2plus) as the first NumPy 
>> Developer in Residence. This is a significant milestone in the history of 
>> the project: for the first time, NumPy is in a position to use its project 
>> funds to pay for a full year of maintainer time. We believe that this will 
>> be an impactful program that will contribute to NumPy’s long-term 
>> sustainability as a community-driven open source project.
>>
>> Sayed has been making major contributions to NumPy since the start of 2020, 
>> in particular around computational performance. He is the main author of the 
>> NumPy SIMD architecture (NEP 38, docs), generously shared his knowledge of 
>> SIMD instructions with the core developer team, and helped integrate the 
>> work of various volunteer and industry contributors in this area. As a 
>> result, we’ve been able to expand support to multiple CPU architectures, 
>> integrating contributions from IBM, Intel, Apple, and others, none of which 
>> would have been possible without Sayed. Furthermore, when NumPy tentatively 
>> started using C++ in 2021, Sayed was one of the proponents of the move and 
>> helped with its implementation.
>>
>> The NumPy Steering Council sees Sayed’s appointment to this role as both 
>> recognition of his past outstanding contributions as well as an opportunity 
>> to continue improving NumPy’s computational performance. In the next 12 
>> months, we’d like to see Sayed focus on the following:
>>
>> SIMD code maintenance,
>> code review of SIMD contributions from others,
>> performance-related features,
>> sharing SIMD and C++ expertise with the team and growing a NumPy 
>> sub-team around it,
>> SIMD build system migration to Meson,
>> and wherever else Sayed’s interests take him.
>>
>> “I’m both happy and nervous: this is a great opportunity, but also a 
>> great responsibility,” said Sayed in response to his appointment.
>>
>> The funds for the NumPy Fellowship Program come from a partnership with 
>> Tidelift and from individual donations. We sincerely thank both Tidelift and 
>> everyone who donated to the project—without you, this program would not be 
>> possible! We also acknowledge the CPython Developer-in-Residence and the 
>> Django Fellowship programs, which served as inspiration for this program.
>>
>> Sayed officially starts as the NumPy

[Numpy-discussion] Re: plan for moving to Meson

2022-11-11 Thread Evgeni Burovski

> (2) a more important one, the `.c.src` format. In SciPy we got rid of it, and 
> we're not going to make Meson understand an ad-hoc templating method that 
> only NumPy uses. So we have two choices: also get rid of it, or write a new 
> custom preprocessing utility for NumPy's Meson build. I think we have too 
> much code using it to remove it, so probably should go with the "new utility" 
> option. But in case we did want to get rid of it, now would be a good time.

As a comment from a peanut gallery, where the project board is not
even visible (https://github.com/orgs/numpy/projects/7/views/7 404s
for me --- it would be perfectly understandable if you prefer to keep
it visible to select individuals!), and it has probably been discussed
before: any thoughts to change it to e.g. tempita templating?
Translating .c.src templates to .c.in is straightforward if tedious,
as e.g. SciPy transition showed.
This is of course quite a bit of work, but so is a new utility.
Again, just throwing it out there :-).

Cheers,

Evgeni
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Representation of NumPy scalars

2022-09-09 Thread Evgeni Burovski

A naive question: what actually are the differences, what an end user need
to worry about when mixing python scalars and numpy scalars? Same question
about a library author.
Or is it mainly about fixed-width integers vs python integers?

Cheers,

Evgeni

пт, 9 сент. 2022 г., 09:58 Kevin Sheppard :

> +1 from me. They are a frequent source of confusion when starting, and
> there appear to be far fewer now then in earlier releases.   It also might
> make it easier to spot any inadvertent scalars coming out if these could be
> Python floats.
>
> Kevin
>
> On Fri, Sep 9, 2022, 07:23 Stefan van der Walt 
> wrote:
>
>> I am in favor of such a change. It will make what is returned more
>> transparent to users (and reduce confusion for newcomers).
>>
>> With NEP50, we're already adopting a philosophy of explicit scalar usage
>> anyway: no longer pretending or trying to make transparent that Python
>> floats and NumPy floats are the same.
>>
>> No one *actually* round-trips objects via repr, but if a user could look
>> at a result and know how to construct the object, that is an improvement.
>>
>> Stéfan
>>
>> On Thu, Sep 8, 2022, at 22:26, Matti Picus wrote:
>> > On 9/9/22 04:15, Warren Weckesser wrote:
>> >> ...
>> >> To quote from https://docs.python.org/3/library/functions.html#repr:
>> >>
>> >>> For many types, this function makes an attempt to return a string
>> >>> that would yield an object with the same value when passed to eval();
>> >> Sebastian, is this an explicit goal of the change?  (Personally, I've
>> >> gotten used to not taking this too seriously, but my world view is
>> >> biased by the long-term use of NumPy, which has never followed this
>> >> guideline.)
>> >>
>> >> If that is a goal, than the floating point types with precision
>> >> greater than double precision will need to display the argument of the
>> >> type as a string.  For example, the following is run on a platform
>> >> where numpy.longdouble is extended precision (80 bits):
>> >>
>> >> ```
>> >> In [161]: longpi = np.longdouble('3.14159265358979323846')
>> >>
>> >> In [162]: longpi
>> >> Out[162]: 3.1415926535897932385
>> >>
>> >> In [163]: np.longdouble(3.1415926535897932385)  # Argument is parsed
>> >> as 64 bit float
>> >> Out[163]: 3.141592653589793116
>> >>
>> >> In [164]: np.longdouble('3.1415926535897932385')  # Correctly
>> >> reproduces the longdouble
>> >> Out[164]: 3.1415926535897932385
>> >> ```
>> >>
>> >> Warren
>> >
>> >
>> > As others have mentioned, the change will greatly enhance UX at the
>> cost
>> > of documentation cleanups. While the representation may not be
>> perfectly
>> > roundtrip-able, I think it still is an improvement and worthwhile.
>> > Elsewhere I have suggested we need more documentation around
>> > array/scalar printing, perhaps that would be a place to mention the
>> > limitations of string representations.
>> >
>> > Matti
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: kevin.k.shepp...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Future of numpy.distutils

2022-06-13 Thread Evgeni Burovski

Hi Jerome,

Yeah, don't read too much into my choice of the pep517 wrapper ---
IIRC I just did "monkey see, monkey do" on
https://github.com/FRidh/mesonpep517examples, it worked for my simple
and limited use case and that was it. Last time I checked though,
mesonpep517 did not work on Windows (it might have been fixed already,
will check at some point). So this depends on your needs I guess.

Evgeni



On Mon, Jun 13, 2022 at 12:53 PM Jerome Kieffer  wrote:
>
> Hi Evgeni,
>
> Thanks for your input, apparently, you project uses `meson-pep517`
> while scipy uses `meson-python` for interfacing meson with the python
> side of the building.
>
> For now, I am not settled on one version or another but among the
> python community there should be one and only one obvious way to do
> things ...
>
> Meson does its job properly, the binary extensions are properly
> compiled, but neither on my code, nor on scipy, those shared library
> are shipped within the wheel.
> There are several warnings like this (and many more when building scipy):
>
> WARNING Using heuristics to map files to wheel, this may result in incorrect 
> locations
> WARNING File could not be mapped to an equivalent wheel directory: 
> /tmp/build-via-sdist-jsg87qg9/fabio-0.15.0-a0/.mesonpy-5bwmoite/install/usr/lib/python3/dist-packages/fabio/ext/cf_io.cpython-39-x86_64-linux-gnu.so
>  ({moduledir_shared}/cf_io.cpython-39-x86_64-linux-gnu.so)
>
> Is this bug fixed is any "unreleased" version of `meson-python` which
> would explain this works for some people but not for me ?
> The simplest is probably to open a bug there.
>
> Cheers,
>
> Jerome
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Future of numpy.distutils

2022-06-10 Thread Evgeni Burovski

> I can confirm that migrating a small project (still including some
> cython) took less than a day to get boot-strapped.
> Not everything works but the structure is there and it kind-of works.


Speaking about small projects with no Fortran or generated sources,
just some cython/C++, here's a small example:
https://github.com/ev-br/mc_lib
Since it's way simpler then scipy, might be easier to follow. Most
likely you already figured it all out :-).

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Automatic formatters for only changed lines

2022-05-08 Thread Evgeni Burovski

Mind sharing your yapf config Juan?


вс, 8 мая 2022 г., 08:40 Juan Nunez-Iglesias :

> With the caveat that I am just an interested bystander, I would like to
> point back to yapf as an alternative. I agree with what has already been
> echoed by the majority of the community, that setting *some* auto-formatter
> is a huge benefit for both review and writing sanity. I very much disagree
> with some of the other choices that black makes, while in contrast I find
> that I can get yapf to write code that I find very close to what I would
> naturally write as an old-fashioned Python style pedant.
>
> I might have already pointed to these lines of code
> 
> that black produced for napari that I find abhorrent:
>
> (
> custom_colormap,
> label_color_index,
> ) = color_dict_to_colormap(self.color)
>
> I can confirm that this hurts readability because recently I and another
> experienced developer stared just a couple of lines downstream for a hot
> minute wondering where the hell "label_color_index" was defined. It is also
> more lines of code than
>
> custom_colormap, label_color_index = (
> color_dict_to_colormap(self.color)
> )
>
> (Personally I really like the pep8 recommendation on hanging indents,
> "further indentation should be used to clearly distinguish itself as a
> continuation line," which black ignores.)
>
> Anyway, didn't mean to start a flame war, just to express my preference
> for yapf and that it would therefore make me very happy to see NumPy adopt
> it as a counteracting force to the monopoly that black is developing. I do
> strongly agree that black is preferable to no formatter. In short my
> recommendation order would be:
>
> 1. Use yapf;
> 2. Use black;
> 3. Don't use a formatter.
>
> Juan.
>
> On Sat, 7 May 2022, at 3:20 PM, Peter Cock wrote:
>
> On Fri, May 6, 2022 at 4:59 PM Charles R Harris 
> wrote:
>
>
> Heh. I think automatic formatting is the future and was happy to see the
> PR. The only question is if black is the way we want to go. IIRC, the main
> sticking points were
>
>- Line length (<= 79).
>- Quotation mark changes (noisy).
>- Formatting of  '*', '**', and '/'
>
> Even if the result isn't perfect by our current standards, I suspect we
> will get used to it and just not worry anymore.
>
>
> On that note, in black v22.1.0 (the first non-beta release), one of the
> changes was to the ** operator to better suit mathematical conventions:
>
>  "Remove spaces around power operators if both operands are simple"
> https://github.com/psf/black/pull/2726
>
> Either way I agree with you, most people seem to get used to black and
> stop worrying about it.
>
> Peter
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: j...@fastmail.com
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: ANN: SciPy 1.8.0

2022-02-05 Thread Evgeni Burovski

Congratulations Tyler, and a heartfelt thank you for pushing this one
through!

вс, 6 февр. 2022 г., 04:19 Charles R Harris :

>
>
> On Sat, Feb 5, 2022 at 5:35 PM Tyler Reddy 
> wrote:
>
>> Hi all,
>>
>> On behalf of the SciPy development team, I'm pleased to announce the
>> release of SciPy 1.8.0.
>>
>>
> Yay! Congratulations on working through all the problems. Some releases
> are just troublesome.
>
> Chuck
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: An article on numpy data types

2021-12-28 Thread Evgeni Burovski

Very nice overview!

One question and one suggestion:

1. Is integer wraparound guaranteed for signed ints, or is it an
implementation detail? For unsigned ints, sure, it's straight from a C
standard; what about signed types however.

2. It'd be nice to explicitly stress that dtype=float corresponds to a C
double, not a C float type. This frequently trips people trying to
interface with C or Cython (in my experience)

вт, 28 дек. 2021 г., 11:12 Lev Maximov :

> >I think timedelta64 is missing.  Is that intentional?
> Yes, thank you! It has stuck in my todo list. Still thinking on the best
> way to include it.
>
> Btw, does it make sense to include the masked arrays? I know Pandas uses
> something
> like a masked array for representing null values in the integer columns.
> Does anyone use
> NumPy masked arrays nowadays?
>
>
> On Tue, Dec 28, 2021 at 3:11 AM Eric Firing  wrote:
>
>> On 2021/12/27 9:32 AM, Lev Maximov wrote:
>> >  > I'm surprised no one has mentioned it already: int and uint are
>> > reversed in the first table.
>> > Not anymore! I know I'm susceptible to this type of blunders )
>> >
>> > Thank you for your kind attention!
>> >
>> > I've made a few more fixes here and there and added a couple of
>> > illustrations. Looks more or less finished to me.
>> > Giving it away to the editors.
>> >
>> > Best regards,
>> > Lev
>> >
>>
>> Lev,
>>
>> I think timedelta64 is missing.  Is that intentional?
>>
>> Eric
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: lev.maxi...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: evgeny.burovs...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Re: [Numpy-discussion] reducing effort spent on wheel builds?

2021-07-15 Thread Evgeni Burovski

> However, I think reducing the CI dedication (+maintainer effort towards
that) to those minority platforms would inevitably increase the risk of
numpy/scipy failing to work on them, and possibly risk not being
installable on them. Would we want to drop them without having a positive
assurance that the minority platform/package maintainers would keep
checking for numpy/scipy health


Nothing blocks a platform champion from running the CI on their scipy fork.
This way, the burden is on them, not on individual contributors (including
first-time contributors who only deal with common platforms).


чт, 15 июл. 2021 г., 15:22 Andrew Nelson :

> I'd be +1 on reducing the number of wheels to reduce maintainer effort
> (esp. the 32 bit builds). However, I think reducing the CI dedication
> (+maintainer effort towards that) to those minority platforms would
> inevitably increase the risk of numpy/scipy failing to work on them, and
> possibly risk not being installable on them. Would we want to drop them
> without having a positive assurance that the minority platform/package
> maintainers would keep checking for numpy/scipy health; I guess one can
> make that choice when you know just how many times those exotic wheels are
> downloaded (modulo CI runs).
>
> I'd like to see sdists being kept on PyPI. Those minority platforms may
> keep patching scipy/numpy source to keep it installable from source, and
> pip knows where to find it. I suspect the majority platforms will have all
> the wheels they need, so only the small set of exotic platforms will use
> sdist, with those users being more capable of dealing with the aftermath.
>
> On Thu, 15 Jul 2021 at 20:53, Evgeni Burovski 
> wrote:
>
>> FWIW, here's a big fat +1 from me for spreading the load. I'd even
>> advocate for trimming the CI and going for "We officially support this
>> (small) list of platforms and gladly accept patches for anything else
>> as long as they do not break the officially supported ones". ISTM the
>> list of supported platforms (both wheels and CI) has grown too large
>> and seems to only grow in time.
>>
>> The user requests are perfectly understandable, everyone wants to be
>> in the press-the-button-and-it-works world, but the maintainer/RM
>> effort seems to be crossing over to being too much.
>>
>> A perfect picture in this regard is probably what we had a couple of
>> years ago with Gohlke Windows wheels. Christoph was doing a great job
>> maintaining the wheels and sending patches. For more exotic platforms,
>> there simply has to be a champion. Or if some entity wants to fund the
>> work for some platform, great --- this should also live somewhere
>> outside of the main repo/CI/pypi.
>>
>> Whether to upload sdists to PYPI --- this is a bit orthogonal, but
>> indeed maybe it's best to only upload the small set of wheels indeed.
>> sdists are just stored in the GH releases and it's not a big deal to
>> grab them from GH (or even pip install from the release tag directly).
>> This would probably improve user experience too (no more cryptic
>> errors from attempting to compile from source on an unsuspecting
>> user).
>>
>> My 2p,
>>
>> Evgeni
>>
>> On Thu, Jul 15, 2021 at 1:22 PM Ralf Gommers 
>> wrote:
>> >
>> > Hey all,
>> >
>> > This whole thread is quite interesting:
>> https://twitter.com/zooba/status/1415440484181417998. Given how much
>> effort we are spending on really niche wheel builds, I’m wondering if we
>> should just draw a line somewhere:
>> >
>> > we do what we do now for the main platforms: Windows, Linux (x86,
>> aarch64), macOS, *but*:
>> > no wheels for ppc64le
>> > no wheels for Alpine Linux
>> > no wheels for PyPy
>> > no wheels for Raspberry Pi, AIX or whatever other niche thing comes
>> next.
>> > drop 32-bit Linux in case it is becoming an annoyance.
>> >
>> > This is not an actual proposal (yet) and I should sleep on this some
>> more, but I've seen Chuck and Matti burn a lot of time on the numpy-wheels
>> repo again recently, and I've done the same for SciPy. The situation is not
>> very sustainable and needs a rethink.
>> >
>> > The current recipe is "someone who cares about a platform writes a PEP,
>> then pip/wheel add a platform tag for it (very little work), and then the
>> maintainers of each Python package are now responsible for wheel builds (a
>> ton of work)". Most of these platforms have package managers, which are all
>> more capable than pip et al., and if they don't then wheels can be hosted
>> elsewhere (example:

Re: [Numpy-discussion] reducing effort spent on wheel builds?

2021-07-15 Thread Evgeni Burovski

FWIW, here's a big fat +1 from me for spreading the load. I'd even
advocate for trimming the CI and going for "We officially support this
(small) list of platforms and gladly accept patches for anything else
as long as they do not break the officially supported ones". ISTM the
list of supported platforms (both wheels and CI) has grown too large
and seems to only grow in time.

The user requests are perfectly understandable, everyone wants to be
in the press-the-button-and-it-works world, but the maintainer/RM
effort seems to be crossing over to being too much.

A perfect picture in this regard is probably what we had a couple of
years ago with Gohlke Windows wheels. Christoph was doing a great job
maintaining the wheels and sending patches. For more exotic platforms,
there simply has to be a champion. Or if some entity wants to fund the
work for some platform, great --- this should also live somewhere
outside of the main repo/CI/pypi.

Whether to upload sdists to PYPI --- this is a bit orthogonal, but
indeed maybe it's best to only upload the small set of wheels indeed.
sdists are just stored in the GH releases and it's not a big deal to
grab them from GH (or even pip install from the release tag directly).
This would probably improve user experience too (no more cryptic
errors from attempting to compile from source on an unsuspecting
user).

My 2p,

Evgeni

On Thu, Jul 15, 2021 at 1:22 PM Ralf Gommers  wrote:
>
> Hey all,
>
> This whole thread is quite interesting: 
> https://twitter.com/zooba/status/1415440484181417998. Given how much effort 
> we are spending on really niche wheel builds, I’m wondering if we should just 
> draw a line somewhere:
>
> we do what we do now for the main platforms: Windows, Linux (x86, aarch64), 
> macOS, *but*:
> no wheels for ppc64le
> no wheels for Alpine Linux
> no wheels for PyPy
> no wheels for Raspberry Pi, AIX or whatever other niche thing comes next.
> drop 32-bit Linux in case it is becoming an annoyance.
>
> This is not an actual proposal (yet) and I should sleep on this some more, 
> but I've seen Chuck and Matti burn a lot of time on the numpy-wheels repo 
> again recently, and I've done the same for SciPy. The situation is not very 
> sustainable and needs a rethink.
>
> The current recipe is "someone who cares about a platform writes a PEP, then 
> pip/wheel add a platform tag for it (very little work), and then the 
> maintainers of each Python package are now responsible for wheel builds (a 
> ton of work)". Most of these platforms have package managers, which are all 
> more capable than pip et al., and if they don't then wheels can be hosted 
> elsewhere (example: https://www.piwheels.org/). And then there's Conda, Nix, 
> Spack, etc. too of course.
>
> Drawing a line somewhere distributes the workload, where packagers who care 
> about some platform and have better tools at hand can do the packaging, and 
> maintainers can go do something with more impact like write new code or 
> review PRs.
>
> 
>
> Cheers,
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] is_triangular, is_diagonal, is_symmetric et al. in NumPy or SciPy linalg

2021-06-30 Thread Evgeni Burovski

Hi Ilhan,

Overall I think something like this would be great. However, I wonder
if you considered having a specialized container with a structure tag
instead of trying to discover the structure. If it's a container, it
can neatly wrap various lapack storage schemes and dispatch to an
appropriate lapack functionality. Possibly even sparse storage
schemes. And it seems a bit more robust than trying to discover the
structure (e.g. what about off-band elements of  \sim 1e-16 etc).

The next question is of course if this should live in scipy/numpy
.linalg or as a separate repo, at least for some time (maybe in the
scipy organization?). So that it can iterate faster, among other
things.
(I'd be interested in contributing FWIW)

Cheers,

Evgeni


On Wed, Jun 30, 2021 at 1:22 AM Ilhan Polat  wrote:
>
> Dear all,
>
> I'm writing some helper Cythpm functions for scipy.linalg which is kinda 
> performant and usable. And there is still quite some wiggle room for more.
>
> In many linalg routines there is a lot of performance benefit if the 
> structure can be discovered in a cheap and reliable way at the outset. For 
> example if symmetric then eig can delegate to eigh or if triangular then 
> triangular solvers can be used in linalg.solve and lstsq so forth
>
> Here is the Cythonized version for Jupyter notebook to paste to discover the 
> lower/upper bandwidth of square array A that competes well with A != 0 just 
> to use some low level function (note the latter returns an array hence more 
> cost is involved) There is a higher level supervisor function that checks 
> C-contiguousness otherwise specializes to different versions of it
>
> Initial cell
>
> %load_ext Cython
> %load_ext line_profiler
> import cython
> import line_profiler
>
> Then another cell
>
> %%cython
> # cython: language_level=3
> # cython: linetrace=True
> # cython: binding = True
> # distutils: define_macros=CYTHON_TRACE=1
> # distutils: define_macros=CYTHON_TRACE_NOGIL=1
>
> cimport cython
> cimport numpy as cnp
> import numpy as np
> import line_profiler
> ctypedef fused np_numeric_t:
> cnp.int8_t
> cnp.int16_t
> cnp.int32_t
> cnp.int64_t
> cnp.uint8_t
> cnp.uint16_t
> cnp.uint32_t
> cnp.uint64_t
> cnp.float32_t
> cnp.float64_t
> cnp.complex64_t
> cnp.complex128_t
> cnp.int_t
> cnp.long_t
> cnp.longlong_t
> cnp.uint_t
> cnp.ulong_t
> cnp.ulonglong_t
> cnp.intp_t
> cnp.uintp_t
> cnp.float_t
> cnp.double_t
> cnp.longdouble_t
>
>
> @cython.linetrace(True)
> @cython.initializedcheck(False)
> @cython.boundscheck(False)
> @cython.wraparound(False)
> cpdef inline (int, int) band_check_internal(np_numeric_t[:, ::1]A):
> cdef Py_ssize_t n = A.shape[0], lower_band = 0, upper_band = 0, r, c
> cdef np_numeric_t zero = 0
>
> for r in xrange(n):
> # Only bother if outside the existing band:
> for c in xrange(r-lower_band):
> if A[r, c] != zero:
> lower_band = r - c
> break
>
> for c in xrange(n - 1, r + upper_band, -1):
> if A[r, c] != zero:
> upper_band = c - r
> break
>
> return lower_band, upper_band
>
> Final cell for use-case ---
>
> # Make arbitrary lower-banded array
> n = 50 # array size
> k = 3 # k'th subdiagonal
> R = np.zeros([n, n], dtype=np.float32)
> R[[x for x in range(n)], [x for x in range(n)]] = 1
> R[[x for x in range(n-1)], [x for x in range(1,n)]] = 1
> R[[x for x in range(1,n)], [x for x in range(n-1)]] = 1
> R[[x for x in range(k,n)], [x for x in range(n-k)]] = 2
>
> Some very haphazardly put together metrics
>
> %timeit band_check_internal(R)
> 2.59 µs ± 84.7 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)
>
> %timeit np.linalg.solve(R, zzz)
> 824 µs ± 6.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>
> %timeit R != 0.
> 1.65 µs ± 43.1 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> So the worst case cost is negligible in general (note that the given code is 
> slower as it uses the fused type however if I go with tempita standalone 
> version is faster)
>
> Two questions:
>
> 1) This is missing np.half/float16 functionality since any arithmetic with 
> float16 is might not be reliable including nonzero check. IS it safe to view 
> it as np.uint16 and use that specialization? I'm not sure about the sign bit 
> hence the question. I can leave this out since almost all linalg suite 
> rejects this datatype due to well-known lack of supprt.
>
> 2) Should this be in NumPy or SciPy linalg? It is quite relevant to be on 
> SciPy but then again this stuff is purely about array structures. But if the 
> opinion is for NumPy then I would need a volunteer because NumPy codebase 
> flies way above my head.
>
>
> All feedback welcome
>
> Best
> ilhan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
>

Re: [Numpy-discussion] Is there a defined way to "unpad" an array, and if not, should there be?

2021-04-13 Thread Evgeni Burovski

Prior art data point: scikit-image.util.crop :

https://scikit-image.org/docs/dev/api/skimage.util.html#skimage.util.crop



вт, 13 апр. 2021 г., 11:37 Eric Wieser :

> Some other options here that avoid the need for a new function:
>
> * Add a `return_view` argument to `pad`, such that for `padded, orig =
> np.pad(arr, ..., return_view=True)`, `orig == arr` and `orig.base is
> padded`. This is useful if `padded` is modified in place, but less useful
> otherwise. It has the advantage of not having to recompute the slices, as
> pad already has them.
> * Accept a `slice` object directly in `np.pad`; for `sl = np.s_[2:-20,
> 4:-40]`, `padded = np.pad(array, sl)`, we have `padded[sl] == array`.
>
> The second idea seems promising to me, but perhaps there are corner cases
> I haven't thought of that it wouldn't help with.
>
> Eric
>
> On Tue, 13 Apr 2021 at 09:26, Ralf Gommers  wrote:
>
>>
>>
>> On Tue, Apr 13, 2021 at 3:37 AM Jeff Gostick  wrote:
>>
>>> It is great to hear that this might be useful.  I would LOVE to create a
>>> PR on this idea and contribute back to numpy...but let's not get ahead of
>>> ourselves :-)
>>>
>>> Regarding the name, I kinda like "unpad" since it relates directly to
>>> "pad", analogous to "ravel" and "unravel" for instance.  Or maybe "depad".
>>> Although, it's possible to use this on any array, not just a previously
>>> padded one, so maybe tying it too directly to "pad" is not right, in which
>>> case "trim" and "crop" are both perfect.  I must admit that I find it odd
>>> that these functions are not in numpy already.  I just searched the docs
>>> and they show up as keyword args for a few functions but are otherwise
>>> conspicuously absent.  Also, funnily, there is a link to "padding arrays"
>>> but it is basically empty:
>>> https://numpy.org/doc/stable/reference/routines.padding.html.
>>>
>>> Alternatively, I don't hate the idea of passing negative pad widths into
>>> "pad".  I actually tried this at one point to see if there was a hidden
>>> functionality there, to no avail.
>>>
>>> BTW, we just adding a custom "unpad" function to our PoreSpy package for
>>> this purpose:
>>> https://github.com/PMEAL/porespy/blob/dev/porespy/tools/_unpadfunc.py
>>>
>>>
>>>
>>> On Mon, Apr 12, 2021 at 9:15 PM Stephan Hoyer  wrote:
>>>
 On Mon, Apr 12, 2021 at 5:12 PM Jeff Gostick 
 wrote:

> I guess I should have clarified that I was inquiring about proposing a
> 'feature request'.  The github site suggested I open a discussion on this
> list first.  There are several ways to effectively unpad an array as has
> been pointed out, but they all require more than a little bit of thought
> and care, are dependent on array shape, and honestly error prone.  It 
> would
> be very valuable to me to have such a 'predefined' function, so I was
> wondering if (a) I was unaware of some function that already does this and
> (b) if I'm alone in thinking this would be useful.
>

 Indeed, this is a fair question.

 Given that this is not entirely trivial to write correctly, I think it
 would be reasonable to add the inverse operation for pad() into NumPy. This
 is generally better than encouraging users to write their own thing.

 From a naming perspective, here are some possibilities:
 unpad
 trim
 crop

 I think "trim" would be pretty descriptive, probably slightly better
 than "unpad."

>>>
>> I'm not a fan of `trim`. We already have `clip` which sounds similar.
>>
>> `unpad` looks like the only one that's completely unambiguous.
>>
>> `crop` sounds like an image processing function, and what we don't want
>> is something like Pillow's `crop(left, top, right, bottom)`.
>>
>> Cheers,
>> Ralf
>>
>>
>> ___
 NumPy-Discussion mailing list
 NumPy-Discussion@python.org
 https://mail.python.org/mailman/listinfo/numpy-discussion

>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-17 Thread Evgeni Burovski

On Thu, Dec 17, 2020 at 1:01 PM Matti Picus  wrote:
>
>
> On 12/17/20 11:47 AM, Evgeni Burovski wrote:
> > Just as a side note, this is not very prominent in the docs, and I'm
> > ready to volunteer to send a doc PR --- I'm only not sure which part
> > of the docs, and would appreciate a pointer.
>
> Maybe here
>
> https://numpy.org/devdocs/reference/random/bit_generators/index.html#seeding-and-entropy
>
> which is here in the sources
>
> https://github.com/numpy/numpy/blob/master/doc/source/reference/random/bit_generators/index.rst#seeding-and-entropy
>
>
> And/or in the SeedSequence docstring documentation
>
> https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence
>
> which is here in the sources
>
> https://github.com/numpy/numpy/blob/master/numpy/random/bit_generator.pyx#L255

Here's the PR, https://github.com/numpy/numpy/pull/18014

Two minor comments, both OT for the PR:

1. The recommendation to seed the generators from the OS --- I've been
bitten by exactly this once. That was a rather exotic combination of a
vendor RNG and a batch queueing system, and some of my runs did end up
with identical random streams. Given that the recommendation is what
it is, it probably means that experience is a singular point and it no
longer happens with modern generators.

2. Robert's comment that `SeedSequence(..., spawn_key=(num,))`  is not
equivalent to `SeedSequence(...).spawn(num)[num]` and that the former
is not recommended. I'm not questioning the recommendation, but then
__repr__ seems to suggest the equivalence:

In [2]: from numpy.random import PCG64, SeedSequence

In [3]: base_seq = SeedSequence(1234)

In [4]: base_seq.spawn(8)
Out[4]:
[SeedSequence(
 entropy=1234,
 spawn_key=(0,),
 ),

Evgeni
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-17 Thread Evgeni Burovski

On Tue, Dec 15, 2020 at 1:00 AM Robert Kern  wrote:
>
> On Mon, Dec 14, 2020 at 3:27 PM Evgeni Burovski  
> wrote:
>>
>> 
>>
>> > I also think that the lock only matters for Multithreaded code not 
>> > Multiprocess.  I believe the latter pickles and unpickles any Generator 
>> > object (and the underying BitGenerator) and so each process has its own 
>> > version.  Note that when multiprocessing the recommended procedure is to 
>> > use spawn() to generate a sequence of BitGenerators and to use a distinct 
>> > BitGenerator in each process. If you do this then you are free from the 
>> > lock.
>>
>> Thanks. Just to confirm: does using SeedSequence spawn_key arg
>> generate distinct BitGenerators? As in
>>
>> cdef class Wrapper():
>> def __init__(self, seed):
>> entropy, num = seed
>> py_gen = PCG64(SeedSequence(entropy, spawn_key=(spawn_key,)))
>> self.rng = 
>> py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")# <---
>> this
>>
>> cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
>> cdef Wrapper rng_1 = Wrapper(seed=(123, 1))
>>
>> And then,of these two objects, do they have distinct BitGenerators?
>
>
> The code you wrote doesn't work (`spawn_key` is never assigned). I can guess 
> what you meant to write, though, and yes, you would get distinct 
> `BitGenerator`s. However, I do not recommend using `spawn_key` explicitly. 
> The `SeedSequence.spawn()` method internally keeps track of how many children 
> it has spawned and uses that to construct the `spawn_key`s for its subsequent 
> children. If you play around with making your own `spawn_key`s, then the 
> parent `SeedSequence(entropy)` might spawn identical `SeedSequence`s to the 
> ones you constructed.
>
> If you don't want to use the `spawn()` API to construct the separate 
> `SeedSequence`s but still want to incorporate some per-process information 
> into the seeds (e.g. the 0 and 1 in your example), then note that a tuple of 
> integers is a valid value for the `entropy` argument. You can have the first 
> item be the same (i.e. per-run information) and the second item be a 
> per-process ID or counter.
>
> cdef class Wrapper():
> def __init__(self, seed):
> py_gen = PCG64(SeedSequence(seed))
> self.rng = py_gen.capsule.PyCapsule_GetPointer(capsule, 
> "BitGenerator")
>
> cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
> cdef Wrapper rng_1 = Wrapper(seed=(123, 1))


Thanks Robert!

I indeed typo'd the spawn_key, and indeed the intention is exactly to
include a worker_id into a seed to make sure each worker gets a
separate stream.

The use of the spawn_key was --- as I now finally realize --- a
misunderstanding of your and Kevin's previous replies in
https://mail.python.org/pipermail/numpy-discussion/2020-July/080833.html

So I'm moving my project to use the `SeedSequence((base_seed,
worker_id))` API --- thanks!

Just as a side note, this is not very prominent in the docs, and I'm
ready to volunteer to send a doc PR --- I'm only not sure which part
of the docs, and would appreciate a pointer.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-14 Thread Evgeni Burovski



> I also think that the lock only matters for Multithreaded code not 
> Multiprocess.  I believe the latter pickles and unpickles any Generator 
> object (and the underying BitGenerator) and so each process has its own 
> version.  Note that when multiprocessing the recommended procedure is to use 
> spawn() to generate a sequence of BitGenerators and to use a distinct 
> BitGenerator in each process. If you do this then you are free from the lock.

Thanks. Just to confirm: does using SeedSequence spawn_key arg
generate distinct BitGenerators? As in

cdef class Wrapper():
def __init__(self, seed):
entropy, num = seed
py_gen = PCG64(SeedSequence(entropy, spawn_key=(spawn_key,)))
self.rng = 
py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")# <---
this

cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
cdef Wrapper rng_1 = Wrapper(seed=(123, 1))

And then,of these two objects, do they have distinct BitGenerators?

Evgeni


>
>
> Kevin
>
>
>
>
>
> From: Evgeni Burovski
> Sent: Monday, December 14, 2020 2:10 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] locking np.random.Generator in a cython nogil 
> context?
>
>
>
> On Mon, Dec 14, 2020 at 4:46 PM Kevin Sheppard
>
>  wrote:
>
> >
>
> > You need to reacquire the gil, then you can get the lock and rerelease the 
> > gil.
>
> >
>
> > I think this works (on phone, so untested)
>
> >
>
> > with gil:
>
> > with nogil, lock:
>
> > ..
>
>
>
> Thanks Kevin.
>
> This surely works, but feels seriously weird. Is this the only / the
>
> recommended way?
>
> I can of course adjust the buffer size to amortize the time for the
>
> GIL manipulation, but this really starts looking like a code smell.
>
> My original motivation was to have inner simulation loops python-free.
>
> Most likely, the issue is that I'm not using the Generator correctly?
>
>
>
> Evgeni
>
>
>
>
>
> > On Mon, Dec 14, 2020, 13:37 Evgeni Burovski  
> > wrote:
>
> >>
>
> >> Hi,
>
> >>
>
> >> What would be the correct way of locking the bit generator of
>
> >> np.random.Generator in cython's nogil context?
>
> >> (This is threading 101, surely, so please forgive my ignorance).
>
> >>
>
> >> The docs for extending np.random.Generator in cython
>
> >> (https://numpy.org/doc/stable/reference/random/extending.html#cython)
>
> >> recommend the following idiom for generating uniform variates, where
>
> >> the GIL is released and a Generator-specific lock is held:
>
> >>
>
> >> x = PCG64()
>
> >> rng =  PyCapsule_GetPointer(x.capsule, capsule_name)
>
> >> with nogil, x.lock:
>
> >> rng.next_double(rng.state)
>
> >>
>
> >> What is the correct way of locking it when already in the nogil
>
> >> section (so that x.lock is not accessible)?
>
> >>
>
> >> The use case is a long-running MC process which generates random
>
> >> variates in a tight loop (so the loop is all nogil). In case it
>
> >> matters, I probably won't be using python threads, but may use
>
> >> multiprocessing.
>
> >>
>
> >> Basically,
>
> >>
>
> >> cdef double uniform(self) nogil:
>
> >> if self.idx >= self.buf.shape[0]:
>
> >> self._fill()
>
> >> cdef double value = self.buf[self.idx]
>
> >> self.idx += 1
>
> >> return value
>
> >>
>
> >> cdef void _fill(self) nogil:
>
> >> self.idx = 0
>
> >> # HERE: Lock ?
>
> >> for i in range(self.buf.shape[0]):
>
> >> self.buf[i] = self.rng.next_double(self.rng.state)
>
> >>
>
> >>
>
> >> Thanks,
>
> >> Evgeni
>
> >>
>
> >>
>
> >> P.S. The full cdef class, for completeness:
>
> >>
>
> >> cdef class RndmWrapper():
>
> >> cdef:
>
> >> double[::1] buf
>
> >> Py_ssize_t idx
>
> >> bitgen_t *rng
>
> >> object py_gen  # keep the garbage collector away
>
> >>
>
> >>def __init__(self, seed=(1234, 0), buf_size=4096, bitgen_kind=None):
>
> >> if bitgen_kind is None:
>
> >> bitgen_kind = PCG64
>
> >>
>
> >> # cf Numpy-discussion list,

Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-14 Thread Evgeni Burovski

On Mon, Dec 14, 2020 at 4:46 PM Kevin Sheppard
 wrote:
>
> You need to reacquire the gil, then you can get the lock and rerelease the 
> gil.
>
> I think this works (on phone, so untested)
>
> with gil:
> with nogil, lock:
> ...

Thanks Kevin.
This surely works, but feels seriously weird. Is this the only / the
recommended way?
I can of course adjust the buffer size to amortize the time for the
GIL manipulation, but this really starts looking like a code smell.
My original motivation was to have inner simulation loops python-free.
Most likely, the issue is that I'm not using the Generator correctly?

Evgeni


> On Mon, Dec 14, 2020, 13:37 Evgeni Burovski  
> wrote:
>>
>> Hi,
>>
>> What would be the correct way of locking the bit generator of
>> np.random.Generator in cython's nogil context?
>> (This is threading 101, surely, so please forgive my ignorance).
>>
>> The docs for extending np.random.Generator in cython
>> (https://numpy.org/doc/stable/reference/random/extending.html#cython)
>> recommend the following idiom for generating uniform variates, where
>> the GIL is released and a Generator-specific lock is held:
>>
>> x = PCG64()
>> rng =  PyCapsule_GetPointer(x.capsule, capsule_name)
>> with nogil, x.lock:
>> rng.next_double(rng.state)
>>
>> What is the correct way of locking it when already in the nogil
>> section (so that x.lock is not accessible)?
>>
>> The use case is a long-running MC process which generates random
>> variates in a tight loop (so the loop is all nogil). In case it
>> matters, I probably won't be using python threads, but may use
>> multiprocessing.
>>
>> Basically,
>>
>> cdef double uniform(self) nogil:
>> if self.idx >= self.buf.shape[0]:
>> self._fill()
>> cdef double value = self.buf[self.idx]
>> self.idx += 1
>> return value
>>
>> cdef void _fill(self) nogil:
>> self.idx = 0
>> # HERE: Lock ?
>> for i in range(self.buf.shape[0]):
>> self.buf[i] = self.rng.next_double(self.rng.state)
>>
>>
>> Thanks,
>> Evgeni
>>
>>
>> P.S. The full cdef class, for completeness:
>>
>> cdef class RndmWrapper():
>> cdef:
>> double[::1] buf
>> Py_ssize_t idx
>> bitgen_t *rng
>> object py_gen  # keep the garbage collector away
>>
>>def __init__(self, seed=(1234, 0), buf_size=4096, bitgen_kind=None):
>> if bitgen_kind is None:
>> bitgen_kind = PCG64
>>
>> # cf Numpy-discussion list, K.~Sheppard, R.~Kern, June 29,
>> 2020 and below
>> # 
>> https://mail.python.org/pipermail/numpy-discussion/2020-June/080794.html
>> entropy, num = seed
>> seed_seq = SeedSequence(entropy, spawn_key=(num,))
>> py_gen = bitgen_kind(seed_seq)
>>
>> # store the python object to avoid it being garbage collected
>> self.py_gen = py_gen
>>
>> capsule = py_gen.capsule
>> self.rng = PyCapsule_GetPointer(capsule, capsule_name)
>> if not PyCapsule_IsValid(capsule, capsule_name):
>> raise ValueError("Invalid pointer to anon_func_state")
>>
>> self.buf = np.empty(buf_size, dtype='float64')
>> self._fill()
>>
>> @cython.boundscheck(False)
>> @cython.wraparound(False)
>> cdef void _fill(self) nogil:
>> self.idx = 0
>> for i in range(self.buf.shape[0]):
>> self.buf[i] = self.rng.next_double(self.rng.state)
>>
>> @cython.boundscheck(False)
>> @cython.wraparound(False)
>> cdef double uniform(self) nogil:
>> if self.idx >= self.buf.shape[0]:
>> self._fill()
>> cdef double value = self.buf[self.idx]
>> self.idx += 1
>> return value
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-14 Thread Evgeni Burovski

Hi,

What would be the correct way of locking the bit generator of
np.random.Generator in cython's nogil context?
(This is threading 101, surely, so please forgive my ignorance).

The docs for extending np.random.Generator in cython
(https://numpy.org/doc/stable/reference/random/extending.html#cython)
recommend the following idiom for generating uniform variates, where
the GIL is released and a Generator-specific lock is held:

x = PCG64()
rng =  PyCapsule_GetPointer(x.capsule, capsule_name)
with nogil, x.lock:
rng.next_double(rng.state)

What is the correct way of locking it when already in the nogil
section (so that x.lock is not accessible)?

The use case is a long-running MC process which generates random
variates in a tight loop (so the loop is all nogil). In case it
matters, I probably won't be using python threads, but may use
multiprocessing.

Basically,

cdef double uniform(self) nogil:
if self.idx >= self.buf.shape[0]:
self._fill()
cdef double value = self.buf[self.idx]
self.idx += 1
return value

cdef void _fill(self) nogil:
self.idx = 0
# HERE: Lock ?
for i in range(self.buf.shape[0]):
self.buf[i] = self.rng.next_double(self.rng.state)


Thanks,
Evgeni


P.S. The full cdef class, for completeness:

cdef class RndmWrapper():
cdef:
double[::1] buf
Py_ssize_t idx
bitgen_t *rng
object py_gen  # keep the garbage collector away

   def __init__(self, seed=(1234, 0), buf_size=4096, bitgen_kind=None):
if bitgen_kind is None:
bitgen_kind = PCG64

# cf Numpy-discussion list, K.~Sheppard, R.~Kern, June 29,
2020 and below
# 
https://mail.python.org/pipermail/numpy-discussion/2020-June/080794.html
entropy, num = seed
seed_seq = SeedSequence(entropy, spawn_key=(num,))
py_gen = bitgen_kind(seed_seq)

# store the python object to avoid it being garbage collected
self.py_gen = py_gen

capsule = py_gen.capsule
self.rng = PyCapsule_GetPointer(capsule, capsule_name)
if not PyCapsule_IsValid(capsule, capsule_name):
raise ValueError("Invalid pointer to anon_func_state")

self.buf = np.empty(buf_size, dtype='float64')
self._fill()

@cython.boundscheck(False)
@cython.wraparound(False)
cdef void _fill(self) nogil:
self.idx = 0
for i in range(self.buf.shape[0]):
self.buf[i] = self.rng.next_double(self.rng.state)

@cython.boundscheck(False)
@cython.wraparound(False)
cdef double uniform(self) nogil:
if self.idx >= self.buf.shape[0]:
self._fill()
cdef double value = self.buf[self.idx]
self.idx += 1
return value
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] The mu.py script will keep running and never end.

2020-10-11 Thread Evgeni Burovski

Just remove %%timeit

пн, 12 окт. 2020 г., 5:56 Hongyi Zhao :

> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
>  wrote:
> >
> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> >  wrote:
> > >
> > > The script seems to be computing the particle numbers for an array of
> chemical potentials.
> > >
> > > Two ways of speeding it up, both are likely simpler then using dask:
> > >
> > > First: use numpy
> > >
> > > 1. Move constructing mu_all out of the loop (np.linspace)
> > > 2. Arrange the integrands into a 2d array
> > > 3. np.trapz along an axis which corresponds to a single integrand array
> > > (Or avoid the overhead of trapz by just implementing the trapezoid
> formula manually)
> >
> >
> > Roughly like this:
> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99
>
> I try to run this notebook, but find that all of the
> function/variable/method can't be found at all, if invoke them in
> separate cells. See here for more details:
>
> https://github.com/hongyi-zhao/test/blob/master/fermi_integrate_np.ipynb
>
> Any hints for this problem?
>
> Regards,
> HY
>
> >
> >
> >
> > > Second:
> > >
> > > Move the loop into cython.
> > >
> > >
> > >
> > >
> > > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao :
> > >>
> > >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <
> andrea.gav...@gmail.com> wrote:
> > >> >
> > >> >
> > >> >
> > >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao 
> wrote:
> > >> >>
> > >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana <
> andrea.gav...@gmail.com> wrote:
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana <
> andrea.gav...@gmail.com> wrote:
> > >> >> >>
> > >> >> >> Hi,
> > >> >> >>
> > >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao <
> hongyi.z...@gmail.com> wrote:
> > >> >> >>>
> > >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <
> robert.k...@gmail.com> wrote:
> > >> >> >>> >
> > >> >> >>> > You don't need to use vectorize() on fermi(). fermi() will
> work just fine on arrays and should be much faster.
> > >> >> >>>
> > >> >> >>> Yes, it really does the trick. See the following for the
> benchmark
> > >> >> >>> based on your suggestion:
> > >> >> >>>
> > >> >> >>> $ time python mu.py
> > >> >> >>> [-10.999 -10.999 -10.999 ...  20. 20. 20.   ]
> [4.973e-84
> > >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> > >> >> >>>
> > >> >> >>> real0m41.056s
> > >> >> >>> user0m43.970s
> > >> >> >>> sys0m3.813s
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> But are there any ways to further improve/increase efficiency?
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> I believe it will get a bit better if you don’t column_stack an
> array 6000 times - maybe pre-allocate your output first?
> > >> >> >>
> > >> >> >> Andrea.
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front
> of your column_stack call and made me think you were stacking your results
> very many times, which is not the case.
> > >> >>
> > >> >> Still not so clear on your solutions for this problem. Could you
> > >> >> please post here the corresponding snippet of your enhancement?
> > >> >
> > >> >
> > >> > I have no solution, I originally thought you were calling
> “column_stack” 6000 times in the loop, but that is not the case, I was
> mistaken. My apologies for that.
> > >> >
> > >> > The timings of your approach is highly dependent on the size of
> your “energy” and “DOS” array -
> > >

Re: [Numpy-discussion] The mu.py script will keep running and never end.

2020-10-11 Thread Evgeni Burovski

вс, 11 окт. 2020 г., 13:31 Hongyi Zhao :

> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski
>  wrote:
> >
> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
> >  wrote:
> > >
> > > The script seems to be computing the particle numbers for an array of
> chemical potentials.
> > >
> > > Two ways of speeding it up, both are likely simpler then using dask:
> > >
> > > First: use numpy
> > >
> > > 1. Move constructing mu_all out of the loop (np.linspace)
> > > 2. Arrange the integrands into a 2d array
> > > 3. np.trapz along an axis which corresponds to a single integrand array
> > > (Or avoid the overhead of trapz by just implementing the trapezoid
> formula manually)
> >
> >
> > Roughly like this:
> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99
>
> I can't find the cython part suggested by you, i.e., move the loop
> into cython. Furthermore, I also learned that the numpy array is
> optimized and has the performance close to C/C++.
>
>
> Basically, it seems pure numpy is OK here and nothing  more sophisticated
> is needed.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] The mu.py script will keep running and never end.

2020-10-11 Thread Evgeni Burovski

On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski
 wrote:
>
> The script seems to be computing the particle numbers for an array of 
> chemical potentials.
>
> Two ways of speeding it up, both are likely simpler then using dask:
>
> First: use numpy
>
> 1. Move constructing mu_all out of the loop (np.linspace)
> 2. Arrange the integrands into a 2d array
> 3. np.trapz along an axis which corresponds to a single integrand array
> (Or avoid the overhead of trapz by just implementing the trapezoid formula 
> manually)


Roughly like this:
https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99



> Second:
>
> Move the loop into cython.
>
>
>
>
> вс, 11 окт. 2020 г., 9:32 Hongyi Zhao :
>>
>> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana  
>> wrote:
>> >
>> >
>> >
>> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao  wrote:
>> >>
>> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana  
>> >> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana  
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao  
>> >> >> wrote:
>> >> >>>
>> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern  
>> >> >>> wrote:
>> >> >>> >
>> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work 
>> >> >>> > just fine on arrays and should be much faster.
>> >> >>>
>> >> >>> Yes, it really does the trick. See the following for the benchmark
>> >> >>> based on your suggestion:
>> >> >>>
>> >> >>> $ time python mu.py
>> >> >>> [-10.999 -10.999 -10.999 ...  20. 20. 20.   ] [4.973e-84
>> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
>> >> >>>
>> >> >>> real0m41.056s
>> >> >>> user0m43.970s
>> >> >>> sys0m3.813s
>> >> >>>
>> >> >>>
>> >> >>> But are there any ways to further improve/increase efficiency?
>> >> >>
>> >> >>
>> >> >>
>> >> >> I believe it will get a bit better if you don’t column_stack an array 
>> >> >> 6000 times - maybe pre-allocate your output first?
>> >> >>
>> >> >> Andrea.
>> >> >
>> >> >
>> >> >
>> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of your 
>> >> > column_stack call and made me think you were stacking your results very 
>> >> > many times, which is not the case.
>> >>
>> >> Still not so clear on your solutions for this problem. Could you
>> >> please post here the corresponding snippet of your enhancement?
>> >
>> >
>> > I have no solution, I originally thought you were calling “column_stack” 
>> > 6000 times in the loop, but that is not the case, I was mistaken. My 
>> > apologies for that.
>> >
>> > The timings of your approach is highly dependent on the size of your 
>> > “energy” and “DOS” array -
>>
>> The size of the “energy” and “DOS” array is Problem-related and
>> shouldn't be reduced arbitrarily.
>>
>> > not to mention calling trapz 6000 times in a loop.
>>
>> I'm currently thinking on parallelization the execution of the for
>> loop, say, with joblib <https://github.com/joblib/joblib>, but I still
>> haven't figured out the corresponding codes. If you have some
>> experience on this type of solution, could you please give me some
>> more hints?
>>
>> >  Maybe there’s a better way to do it with another approach, but at the 
>> > moment I can’t think of one...
>> >
>> >>
>> >>
>> >> Regards,
>> >> HY
>> >> >
>> >> >>
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>> Regards,
>> >> >>> HY
>> >> >>>
>> >> >>> >
>> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao  
>> >> >>> > wrote:
>> >> >>> >>
>> >

Re: [Numpy-discussion] The mu.py script will keep running and never end.

2020-10-11 Thread Evgeni Burovski

The script seems to be computing the particle numbers for an array of
chemical potentials.

Two ways of speeding it up, both are likely simpler then using dask:

First: use numpy

1. Move constructing mu_all out of the loop (np.linspace)
2. Arrange the integrands into a 2d array
3. np.trapz along an axis which corresponds to a single integrand array
(Or avoid the overhead of trapz by just implementing the trapezoid formula
manually)

Second:

Move the loop into cython.




вс, 11 окт. 2020 г., 9:32 Hongyi Zhao :

> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana 
> wrote:
> >
> >
> >
> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao  wrote:
> >>
> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana 
> wrote:
> >> >
> >> >
> >> >
> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana 
> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao 
> wrote:
> >> >>>
> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern 
> wrote:
> >> >>> >
> >> >>> > You don't need to use vectorize() on fermi(). fermi() will work
> just fine on arrays and should be much faster.
> >> >>>
> >> >>> Yes, it really does the trick. See the following for the benchmark
> >> >>> based on your suggestion:
> >> >>>
> >> >>> $ time python mu.py
> >> >>> [-10.999 -10.999 -10.999 ...  20. 20. 20.   ] [4.973e-84
> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >>>
> >> >>> real0m41.056s
> >> >>> user0m43.970s
> >> >>> sys0m3.813s
> >> >>>
> >> >>>
> >> >>> But are there any ways to further improve/increase efficiency?
> >> >>
> >> >>
> >> >>
> >> >> I believe it will get a bit better if you don’t column_stack an
> array 6000 times - maybe pre-allocate your output first?
> >> >>
> >> >> Andrea.
> >> >
> >> >
> >> >
> >> > I’m sorry, scratch that: I’ve seen a ghost white space in front of
> your column_stack call and made me think you were stacking your results
> very many times, which is not the case.
> >>
> >> Still not so clear on your solutions for this problem. Could you
> >> please post here the corresponding snippet of your enhancement?
> >
> >
> > I have no solution, I originally thought you were calling “column_stack”
> 6000 times in the loop, but that is not the case, I was mistaken. My
> apologies for that.
> >
> > The timings of your approach is highly dependent on the size of your
> “energy” and “DOS” array -
>
> The size of the “energy” and “DOS” array is Problem-related and
> shouldn't be reduced arbitrarily.
>
> > not to mention calling trapz 6000 times in a loop.
>
> I'm currently thinking on parallelization the execution of the for
> loop, say, with joblib , but I still
> haven't figured out the corresponding codes. If you have some
> experience on this type of solution, could you please give me some
> more hints?
>
> >  Maybe there’s a better way to do it with another approach, but at the
> moment I can’t think of one...
> >
> >>
> >>
> >> Regards,
> >> HY
> >> >
> >> >>
> >> >>
> >> >>>
> >> >>>
> >> >>> Regards,
> >> >>> HY
> >> >>>
> >> >>> >
> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao 
> wrote:
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed by
> pyenv. I
> >> >>> >> try to run the script
> >> >>> >> <
> https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py
> >,
> >> >>> >> but it will keep running and never end. When I use 'Ctrl + c' to
> >> >>> >> terminate it, it will give the following output:
> >> >>> >>
> >> >>> >> $ python mu.py
> >> >>> >> [-10.999 -10.999 -10.999 ...  20. 20. 20.   ] [4.973e-84
> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84]
> >> >>> >>
> >> >>> >> I have to terminate it and obtained the following information:
> >> >>> >>
> >> >>> >> ^CTraceback (most recent call last):
> >> >>> >>   File "mu.py", line 38, in 
> >> >>> >> integrand=DOS*fermi_array(energy,mu,kT)
> >> >>> >>   File
> "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >>> >> line 2108, in __call__
> >> >>> >> return self._vectorize_call(func=func, args=vargs)
> >> >>> >>   File
> "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py",
> >> >>> >> line 2192, in _vectorize_call
> >> >>> >> outputs = ufunc(*inputs)
> >> >>> >>   File "mu.py", line 8, in fermi
> >> >>> >> return 1./(exp((E-mu)/kT)+1)
> >> >>> >> KeyboardInterrupt
> >> >>> >>
> >> >>> >>
> >> >>> >> Any helps and hints for this problem will be highly appreciated?
> >> >>> >>
> >> >>> >> Regards,
> >> >>> >> --
> >> >>> >> Hongyi Zhao 
> >> >>> >> ___
> >> >>> >> NumPy-Discussion mailing list
> >> >>> >> NumPy-Discussion@python.org
> >> >>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >>> >
> >> >>> > ___
> >> >>> > NumPy-Discussion mailing list
> >> >>>

Re: [Numpy-discussion] the NumPy paper is out!

2020-09-17 Thread Evgeni Burovski

Congratulations!

On Wed, Sep 16, 2020 at 8:54 PM Ralf Gommers  wrote:
>
> Hi all,
>
> Nature published our first official paper: 
> https://www.nature.com/articles/s41586-020-2649-2
>
> And here is the Twitter announcement: 
> https://twitter.com/numpy_team/status/1306268442450972674
>
> Thanks to everyone who contributed to this milestone! A few special mentions:
>
> Chuck, for many years of tireless labour keeping our code base in good shape, 
> and being the top committer on the NumPy project by now with more than twice 
> the commits (says `git shortlog -sn`) as the number two committer. This is 
> why Chuck is the first author on the paper.
>
> Travis, for creating NumPy. This is why Travis is the last author on the 
> paper, as our "elder statesman".
>
> Jarrod and Stéfan, for spending a ton of time on making this the best 
> possible paper we could possibly produce on NumPy.
>
> Cheers,
> Ralf
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] reseed random generator (1.19)

2020-07-04 Thread Evgeni Burovski

Thanks Kevin, thanks Robert, this is very helpful!

I'd strongly agree with Matti that your explanations could/should make
it to the docs. Maybe it's something for the GSoD.

While we're on the subject, one comment and two (hopefully last) questions:

1. My two cents w.r.t. `np.random.simple_seed()` function Robert
mentioned: I personally would find it way more confusing than a clear
explanation + example in the docs. I'd ask myself what's "simple"
here, click through to the source of this `simple_seed`, find out that
it's a docsting and a two-liner, and just copy-paste the latter into
my user code. Again, just FWIW.

2. What would be a preferred way of spelling out "give me the N-th
spawned child SeedSequence"?
The use case is that I prepare (human-readable) input files once and
run a number of computational jobs in separate OS processes. From what
Kevin said, I can of course five each worker a pair of (entropy,
worker_id) and then each of them does at startup

> parent_seq = SeedSequence(entropy)
> this_sequence = seed_seq.spawn(worker_id)[worker_id]

Is this a recommended way, or is there a better API? Or does the
number of spawned children need to be known beforehand?
I'd much rather avoid serialization/deserialization if possible.

3. Is there a way of telling the number of draws a generator did?

The use case is to checkpoint the number of draws and `.advance` the
bit generator when resuming from the checkpoint. (The runs are longer
then the batch queue limits).

Thanks!

Evgeni

On Mon, Jun 29, 2020 at 11:06 PM Robert Kern  wrote:
>
> On Mon, Jun 29, 2020 at 11:30 AM Robert Kern  wrote:
>>
>> On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard  
>> wrote:
>>>
>>> The total number of digits in the binary representation is somewhere 
>>> between 32 and 128.
>>
>>
>> I like using the standard library `secrets` module.
>>
>> >>> import secrets
>> >>> secrets.randbelow(1<<128)
>> 8080125189471896523368405732926911908
>>
>> If you want an easy-to-follow rule, just use the above snippet to get a 
>> 128-bit number. More than 128 bits won't do you any good (at least by 
>> default, the internal bottleneck inside of SeedSequence is a 128-bit pool), 
>> and 128-bit numbers are just about small enough to copy-paste comfortably.
>
>
> Sorry, `secrets.randbits(128)` is the cleaner form of this.
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] reseed random generator (1.19)

2020-06-29 Thread Evgeni Burovski

Thanks Kevin!

A possibly dumb follow-up question: in your example,

> entropy = 382193877439745928479635728

is it relevant that `entropy` is a long integer? I.e., what are the
constraints on its value, can one use

entropy = 1234 or
entropy = 0 or
entropy = 1

instead?




On Mon, Jun 29, 2020 at 5:37 PM Kevin Sheppard
 wrote:
>
> The best practice is to use a SeedSequence to spawn child SeedSequences, and 
> then to use these children to initialize your generators or bit generators.
>
>
>
>
>
> from numpy.random import SeedSequence, Generator, PCG64, default_rng
>
>
>
> entropy = 382193877439745928479635728
>
>
>
> seed_seq = SeedSequence(entropy)
>
> NUM_STREAMS = 2**15
>
> children = seed_seq.spawn(NUM_STREAMS)
>
> # if you want the current best bit generator, which may change
>
> rngs = [default_rng(child) for child in children]
>
> # If you want the most control across version, set the bit generator
>
> # this uses PCG64, which is the current default. Each bit generator needs to 
> be wrapped in a generator
>
> rngs = [Generator(PCG64(child)) for child in children]
>
>
>
> Kevin
>
>
>
>
>
> From: Evgeni Burovski
> Sent: Monday, June 29, 2020 2:21 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] reseed random generator (1.19)
>
>
>
> (apologies for jumping into a conversation)
>
> So what is the recommendation for instantiating a number of generators
>
> with manually controlled seeds?
>
>
>
> The use case is running a series of MC simulations with reproducible
>
> streams. The runs are independent and are run in parallel in separate
>
> OS processes, where I do not control the time each process starts
>
> (jobs are submitted to the batch queue), so default seeding seems
>
> dubious?
>
>
>
> Previously, I would just do roughly
>
>
>
> seeds = [1234, 1235, 1236, ...]
>
> rngs = [np.random.RandomState(seed) for seed in seeds]
>
> ...
>
> and each process operates with its own `rng`.
>
> What would be the recommended way with the new `Generator` framework?
>
> A human-friendly way would be preferable if possible.
>
>
>
> Thanks,
>
>
>
> Evgeni
>
>
>
>
>
> On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard
>
>  wrote:
>
> >
>
> > If you want to use the same entropy-initialized generator for 
> > temporarily-reproducible experiments, then you can use
>
> >
>
> >
>
> >
>
> > gen = np.random.default_rng()
>
> >
>
> > state = gen.bit_generator.state
>
> >
>
> > gen.standard_normal()
>
> >
>
> > # 0.5644742559549797, will vary across runs
>
> >
>
> > gen.bit_generator.state = state
>
> >
>
> > gen.standard_normal()
>
> >
>
> > # Always the same as before 0.5644742559549797
>
> >
>
> >
>
> >
>
> > The equivalent to the old way of calling seed to reseed is:
>
> >
>
> >
>
> >
>
> > SEED = 918273645
>
> >
>
> > gen = np.random.default_rng(SEED)
>
> >
>
> > gen.standard_normal()
>
> >
>
> > # 0.12345677
>
> >
>
> > gen = np.random.default_rng(SEED)
>
> >
>
> > gen.standard_normal()
>
> >
>
> > # Identical value
>
> >
>
> >
>
> >
>
> > Rather than reseeding the same object, you just create a new object. At 
> > some point in the development of Generator both methods were timed and 
> > there was no performance to reusing the same object by reseeding.
>
> >
>
> >
>
> >
>
> > Kevin
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > From: Neal Becker
>
> > Sent: Monday, June 29, 2020 1:01 PM
>
> > To: Discussion of Numerical Python
>
> > Subject: Re: [Numpy-discussion] reseed random generator (1.19)
>
> >
>
> >
>
> >
>
> > I was using this to reset the generator, in order to repeat the same 
> > sequence again for testing purposes.
>
> >
>
> >
>
> >
>
> > On Wed, Jun 24, 2020 at 6:40 PM Robert Kern  wrote:
>
> >
>
> > On Wed, Jun 24, 2020 at 3:31 PM Neal Becker  wrote:
>
> >
>
> > Consider the following:
>
> >
>
> >
>
> >
>
> > from numpy.random import default_rng
>
> > rs = default_rng()
>
> >
>
> >
>
> >
>
> > Now how do I re-seed the generator?
>
&

Re: [Numpy-discussion] reseed random generator (1.19)

2020-06-29 Thread Evgeni Burovski

(apologies for jumping into a conversation)
So what is the recommendation for instantiating a number of generators
with manually controlled seeds?

The use case is running a series of MC simulations with reproducible
streams. The runs are independent and are run in parallel in separate
OS processes, where I do not control the time each process starts
(jobs are submitted to the batch queue), so default seeding seems
dubious?

Previously, I would just do roughly

seeds = [1234, 1235, 1236, ...]
rngs = [np.random.RandomState(seed) for seed in seeds]
...
and each process operates with its own `rng`.
What would be the recommended way with the new `Generator` framework?
A human-friendly way would be preferable if possible.

Thanks,

Evgeni


On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard
 wrote:
>
> If you want to use the same entropy-initialized generator for 
> temporarily-reproducible experiments, then you can use
>
>
>
> gen = np.random.default_rng()
>
> state = gen.bit_generator.state
>
> gen.standard_normal()
>
> # 0.5644742559549797, will vary across runs
>
> gen.bit_generator.state = state
>
> gen.standard_normal()
>
> # Always the same as before 0.5644742559549797
>
>
>
> The equivalent to the old way of calling seed to reseed is:
>
>
>
> SEED = 918273645
>
> gen = np.random.default_rng(SEED)
>
> gen.standard_normal()
>
> # 0.12345677
>
> gen = np.random.default_rng(SEED)
>
> gen.standard_normal()
>
> # Identical value
>
>
>
> Rather than reseeding the same object, you just create a new object. At some 
> point in the development of Generator both methods were timed and there was 
> no performance to reusing the same object by reseeding.
>
>
>
> Kevin
>
>
>
>
>
>
>
> From: Neal Becker
> Sent: Monday, June 29, 2020 1:01 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] reseed random generator (1.19)
>
>
>
> I was using this to reset the generator, in order to repeat the same sequence 
> again for testing purposes.
>
>
>
> On Wed, Jun 24, 2020 at 6:40 PM Robert Kern  wrote:
>
> On Wed, Jun 24, 2020 at 3:31 PM Neal Becker  wrote:
>
> Consider the following:
>
>
>
> from numpy.random import default_rng
> rs = default_rng()
>
>
>
> Now how do I re-seed the generator?
>
> I thought perhaps rs.bit_generator.seed(), but there is no such attribute.
>
>
>
> In general, reseeding an existing generator instance is not a good practice. 
> What effect are you trying to accomplish? I assume that you are asking this 
> because you are currently using `RandomState.seed()`. In what circumstances?
>
>
>
> The raw `bit_generator.state` property *can* be assigned to, in order to 
> support some advanced use cases (mostly involving de/serialization and 
> similar kinds of meta-programming tasks). It's also been helpful for me to 
> construct worst-case scenarios for testing parallel streams. But it quite 
> deliberately bypasses the notion of deriving the state from a human-friendly 
> seed number.
>
>
>
> --
>
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
>
> --
>
> Those who don't understand recursion are doomed to repeat it
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

2020-02-22 Thread Evgeni Burovski

Hi Sebastian,

Just to clarify the difference:

>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)

Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).

If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
is probably way too late to even think about it anyway.

Cheers,

Evgeni

On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
 wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
>   * Scalars are faster (although that can be optimized likely)
>
>   * Scalars have a lower memory footprint
>
>   * The current implementation incurs a technical debt in NumPy.
> (I do not think that is a general issue, though. We could
> automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
>   * No need to keep track of scalars to preserve them in ufuncs, or
> libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
> (or decide they return always arrays, although ufuncs may not)
>
>   * Seems simpler in many ways, you always know the output will be an
> array if it has to do with NumPy.
>
> Advantages of having scalars:
>
>   * Scalars are immutable and we are used to them from Python.
> A 0-D array cannot be used as a dictionary key consistently [1].
>
> I.e. without scalars as first class citizen `dict[arr1d[0]]`
> cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
> and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
>   * Object arrays as we have them now make sense, `arr1d[0]` can
> reasonably return a Python object. I.e. arrays feel more like
> container if you can take elements out easily.
>
> Could go both ways:
>
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> without scalars. With scalars `arr1d[0, ...]` clarifies the
> meaning. (In principle it is good to never use `arr2d[0]` to
> get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
>* np.add(scalar, scalar) -> scalar
>* np.add.reduce(arr, axis=None) -> scalar
>* np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>* np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] a new grant for NumPy and OpenBLAS!

2019-11-15 Thread Evgeni Burovski

Congratulations!

пт, 15 нояб. 2019 г., 2:42 Ralf Gommers :

> Hi all,
>
> I'm very pleased to announce that NumPy and OpenBLAS have received a joint
> grant for $195,000 from the Chan Zuckerberg Initiative.
>
> In summary, this grant is for high-level documentation, website
> development and graphic design, governance activities and community
> building for NumPy, and for technical work on OpenBLAS (which is one of
> NumPy's key dependencies). I wrote a blog post on what this grant is about
> and some background at [1], and published the full proposal at [2]. The
> program managers wrote a blog post about the whole program [3] which is
> also well worth reading.
>
> I'm looking forward to what we'll be able to do with this grant. The work
> is planned to start quite soon, Dec 1st, and run for one year.
>
> Questions and ideas on any aspect of this grant are very welcome!
>
> Cheers,
> Ralf
>
> [1] https://labs.quansight.org/blog/2019/11/numpy-openblas-CZI-grant/
> [2]
> https://figshare.com/articles/Proposal_NumPy_OpenBLAS_for_Chan_Zuckerberg_Initiative_EOSS_2019_round_1/10302167
> [3]
> https://medium.com/@cziscience/the-invisible-foundations-of-biomedicine-4ab7f8d4f5dd
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Unsupporting python3.5

2019-10-19 Thread Evgeni Burovski

>
>

>>> That's fair. The 1.17.3 release supporting Python 3.5--3.8 went well
>>> enough. The only glitch was that I had to explicitly use OSX 10.6 and icode
>>> 6.4 for the 3.5 wheels on the Mac.
>>>
>>
>> So do you have a preference for dropping or not dropping for 1.18?
>>
>
> Let's not drop it. Four weeks isn't that long to wait.
>

+1 from the peanut gallery.

Cheers,

Evgeni


>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Low-level API for Random

2019-09-19 Thread Evgeni Burovski

>>
>>
>> 1. Should there be a prefix on the C functions?
>> 2. If so, what should the prefix be?
>>
>
Preferably, yes. Don't have an opinion on an exact prefix, as long as it
allows me to e.g. swap a normal distribution generator in my cython/c++
user code without too much mess.


if the only goal is to let people write new generators rather than use the
> existing ones from Cython without the Python overhead).
>

Is it the only goal?

If possible, it'd be worth IMO supporting something more like
cython_lapack, so that one can use existing machinery from a cython
application.

Use case: an MC application where drawing random variates is in a hot loop.
Then I can start from a python prototype and cythonize it gradually.
Sure I can reach into non-public parts


In the end we want to get to a doc section similar to
> http://scipy.github.io/devdocs/special.cython_special.html I'd think.
>
> 3. Should the legacy C functions be part of the API -- these are mostly
>> the ones that produce or depend on polar transform normals (Box-Muller). I
>> have a feeling no, but there may be reasons to prefer BM since they do not
>> depend on rejection sampling.
>>
>
> Even if there would be a couple of users interested, it would be odd
> starting to do this after deeming the code "legacy". So I agree with your
> "no".
>

Unless it's a big maintenance burden, is there an issue with exposing both
ziggurat_normal and bm_normal?
Sure, I can cook up a BM transform myself (yet again) however.


>
>> 4. Should low-level API be consumable like any other numpy C API by
>> including the usual header locations and library locations?  Right now, the
>> pxd simplifies writing Cython but users have sp specify the location of the
>> headers and source manually  An alternative would be to provide a function
>> like np.get_include() -> np.random.get_include() that would specialize in
>> random.
>>
>
> Good question. I'm not sure this is "like any other NumPy C API". We don't
> provide a C API for fft, linalg or other functionality further from core
> either. It's possible of course, but does it really help library authors or
> end users?
>

While I gave only anecdotal evidence, not hard data, I suspect that both
cython and C API would be useful.
E.g. there are c++ applications which use boost::random, would be nice to
be able to swap it for numpy.random.
Also reproducibility: it's *much* easier to debug the compiled app vs its
python prototype if random streams are identical.

Like I said, take all I'm saying with enough salt, as I'm wearing my user
hat here.

Cheers,
Evgeni
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Style guide for numpy code?

2019-05-10 Thread Evgeni Burovski

Hi Joe,

Thanks for sharing!

I'm going to use your handout as a base for my numerical computing classes,
(with an appropriate citation, of course :-)).




чт, 9 мая 2019 г., 21:19 Joe Harrington :

> I have a handout for my PHZ 3150 Introduction to Numerical Computing
> course that includes some rules:
>
> (a) All integer-valued floating-point numbers should have decimal points
> after them. For
> example, if you have a time of 10 sec, do not use
>
> y = np.e**10 # sec
>
> use
>
> y = np.e**10. # sec
>
> instead.  For example, an item count is always an integer, but a distance
> is always a float.  A decimal in the range (-1,1) must always have a zero
> before the decimal point, for readability:
>
> x = 0.23 # Right!
>
> x = .23 # WRONG
>
> The purpose of this one is simply to build the decimal-point habit.  In
> Python it's less of an issue now, but sometimes code is translated, and
> integer division is still out there.  For that reason, in other languages,
> it may be desirable to use a decimal point even for counts, unless integer
> division is wanted.  Make a comment whenever you intend integer division
> and the language uses the same symbol (/) for both kinds of division.
>
> (b) Use spaces around binary operations and relations (=<>+-*/). Put a
> space after “,”.
> Do not put space around “=” in keyword arguments, or around “ ** ”.
>
> (c) Do not put plt.show() in your homework file! You may put it in a
> comment if you
> like, but it is not necessary. Just save the plot. If you say
>
> plt.ion()
>
> plots will automatically show while you are working.
>
> (d) Use:
>
> import matplotlib.pyplot as plt
>
> NOT:
>
> import matplotlib.pylab as plt
>
> (e) Keep lines to 80 characters, max, except in rare cases that are well
> justified, such as
> very long strings. If you make comments on the same line as code, keep
> them short or
> break them over more than a line:
>
> code = code2   # set code equal to code2
>
> # Longer comment requiring much more space because
> # I'm explaining something complicated.
> code = code2
>
> code = code2   # Another way to do a very long comment,
># like this one, which runs over more than
># one line.
>
> (f) Keep blocks of similar lines internally lined up on decimals,
> comments, and = signs.  This makes them easier to read and verify.  There
> will be some cases when this is impractical.  Use your judgment (you're not
> a computer, you control the computer!):
>
> x=   1.  # this is a comment
> y= 378.2345  # here's another
> fred = chuck # note how the decimals, = signs, and
>  # comments line up nicely...
> alacazamshmazooboloid = 2721 # but not always!
>
> (g) Put the units and sources of all values in comments:
>
> t_planet = 523. # K, Smith and Jones (2016, ApJ 234, 22)
>
> (h) I don't mean to start a religious war, but I emphasize the alignment
> of similar adjacent code lines to make differences pop out and reduce the
> likelihood of bugs.  For example, it is much easier to verify the
> correctness of:
>
> a = 3 * x + 3 * 8. * short- 5. * np.exp(np.pi * omega * t)
> a_alt = 3 * x + 3 * 8. * anotshortvar - 5. * np.exp(np.pi * omega * t)
>
> than:
>
> a = 3 * x + 3 * 8. * short - 5. * np.exp(np.pi * omega * t)
> a_altvarname = 3 * x + 3*9*anotshortvar - 5. * np.exp(np.pi * omega * i)
>
> (i) Assign values to meaningful variables, and use them in formulae and
> functions:
>
> ny = 512
> nx = 512
> image = np.zeros((ny, nx))
> expr1 = ny * 3
> expr2 = nx * 4
>
> Otherwise, later on when you upgrade to 2560x1440 arrays, you won't know
> which of the 512s are in the x direction and which are in the y direction.
> Or, the student you (now a senior researcher) assign to code the upgrade
> won't!  Also, it reduces bugs arising from the order of arguments to
> functions if the args have meaningful names.  This is not to say that you
> should assign all numbers to functions.  This is fine:
>
> circ = 2 * np.pi * r
>
> (j) All functions assigned for grading must have full docstrings in
> numpy's format, as well as internal comments.  Utility functions not
> requested in the assignment and that the user will never see can have
> reduced docstrings if the functions are simple and obvious, but at least
> give the one-line summary.
>
> (k) If you modify an existing function, you must either make a Git entry
> or, if it is not under revision control, include a Revision History section
> in your docstring and record your name, the date, the version number, your
> email, and the nature of the change you made.
>
> (l) Choose variable names that are meaningful and consistent in style.
> Document your style either at the head of a module or in a separate text
> file for the project.  For example, if you use CamelCaps with initial
> capital, say that.  If you reserve initial capitals for classes, say that.
> If you use underscores for variable subscripts and camelCaps for the base
> variables, say

Re: [Numpy-discussion] ANN: SciPy 1.3.0rc1 -- please test

2019-04-27 Thread Evgeni Burovski

For ``pinv``, ``pinv2``, and ``pinvh``, the default cutoff values are changed
> for consistency (see the docs for the actual values).
>
> `scipy.stats` changes
> -
>
> Previously, ``ks_2samp(data1, data2)`` would run a two-sided test and return
> the approximated p-value. The new signature, ``ks_2samp(data1, data2,
> alternative="two-sided", method="auto")``, still runs the two-sided test by
> default but returns the exact p-value for small samples and the approximated
> value for large samples. ``method="asymp"`` would be equivalent to the
> old version but ``auto`` is the better choice.
>
> Other changes
> =
>
> Our tutorial has been expanded with a new section on global optimizers
>
> There has been a rework of the ``stats.distributions`` tutorials.
>
> `scipy.optimize` now correctly sets the convergence flag of the result to
> ``CONVERR``, a convergence error, for bounded scalar-function root-finders
> if the maximum iterations has been exceeded, ``disp`` is false, and
> ``full_output`` is true.
>
> `scipy.optimize.curve_fit` no longer fails if ``xdata`` and ``ydata`` dtypes
> differ; they are both now automatically cast to ``float64``.
>
> `scipy.ndimage` functions including ``binary_erosion``, ``binary_closing``, 
> and
> ``binary_dilation`` now require an integer value for the number of iterations,
> which alleviates a number of reported issues.
>
> Fixed normal approximation in case ``zero_method == "pratt"`` in
> `scipy.stats.wilcoxon`.
>
> Fixes for incorrect probabilities, broadcasting issues and thread-safety
> related to stats distributions setting member variables inside 
> ``_argcheck()``.
>
> `scipy.optimize.newton` now correctly raises a ``RuntimeError``, when default
> arguments are used, in the case that a derivative of value zero is obtained,
> which is a special case of failing to converge.
>
> A draft toolchain roadmap is now available, laying out a compatibility plan
> including Python versions, C standards, and NumPy versions.
>
>
> Authors
> ===
>
> * ananyashreyjain +
> * ApamNapat +
> * Scott Calabrese Barton +
> * Christoph Baumgarten
> * Peter Bell +
> * Jacob Blomgren +
> * Doctor Bob +
> * Mana Borwornpadungkitti +
> * Matthew Brett
> * Evgeni Burovski
> * CJ Carey
> * Vega Theil Carstensen +
> * Robert Cimrman
> * Forrest Collman +
> * Pietro Cottone +
> * David +
> * Idan David +
> * Christoph Deil
> * Dieter Werthmüller
> * Conner DiPaolo +
> * Dowon
> * Michael Dunphy +
> * Peter Andreas Entschev +
> * Gökçen Eraslan +
> * Johann Faouzi +
> * Yu Feng
> * Piotr Figiel +
> * Matthew H Flamm
> * Franz Forstmayr +
> * Christoph Gohlke
> * Richard Janis Goldschmidt +
> * Ralf Gommers
> * Lars Grueter
> * Sylvain Gubian
> * Matt Haberland
> * Yaroslav Halchenko
> * Charles Harris
> * Lindsey Hiltner
> * JakobStruye +
> * He Jia +
> * Jwink3101 +
> * Greg Kiar +
> * Julius Bier Kirkegaard
> * John Kirkham +
> * Thomas Kluyver
> * Vladimir Korolev +
> * Joseph Kuo +
> * Michael Lamparski +
> * Eric Larson
> * Denis Laxalde
> * Katrin Leinweber
> * Jesse Livezey
> * ludcila +
> * Dhruv Madeka +
> * Magnus +
> * Nikolay Mayorov
> * Mark Mikofski
> * Jarrod Millman
> * Markus Mohrhard +
> * Eric Moore
> * Andrew Nelson
> * Aki Nishimura +
> * OGordon100 +
> * Petar Mlinarić +
> * Stefan Peterson
> * Matti Picus +
> * Ilhan Polat
> * Aaron Pries +
> * Matteo Ravasi +
> * Tyler Reddy
> * Ashton Reimer +
> * Joscha Reimer
> * rfezzani +
> * Riadh +
> * Lucas Roberts
> * Heshy Roskes +
> * Mirko Scholz +
> * Taylor D. Scott +
> * Srikrishna Sekhar +
> * Kevin Sheppard +
> * Sourav Singh
> * skjerns +
> * Kai Striega
> * SyedSaifAliAlvi +
> * Gopi Manohar T +
> * Albert Thomas +
> * Timon +
> * Paul van Mulbregt
> * Jacob Vanderplas
> * Daniel Vargas +
> * Pauli Virtanen
> * VNMabus +
> * Stefan van der Walt
> * Warren Weckesser
> * Josh Wilson
> * Nate Yoder +
> * Roman Yurchak
>
> A total of 97 people contributed to this release.
> People with a "+" by their names contributed a patch for the first time.
> This list of names is automatically generated, and may not be fully complete.
>
> Issues closed for 1.3.0
> ---
>
> * `#1320 <https://github.com/scipy/scipy/issues/1320>`__: 
> scipy.stats.distribution: problem with self.a, self.b if they...
> * `#2002 <https://github.com/scipy/scipy/issues/2002>`__: members set in 
> scipy.stats.distributions.##._argcheck (Trac #1477)
> * `#2823 <https://github.com/

Re: [Numpy-discussion] Google Summer of Docs ideas?

2019-04-18 Thread Evgeni Burovski

Hi Ralf,

If all this needs is a name, you can add mine. How much time I can put into
it though is... not much, I'm afraid.

Cheers,

Evgeni

чт, 18 апр. 2019 г., 23:51 Ralf Gommers ralf.gomm...@gmail.com:

>
>
> On Sun, Apr 14, 2019 at 6:42 PM Ralf Gommers 
> wrote:
>
>>
>>
>> On Tue, Apr 2, 2019 at 10:45 AM Ralf Gommers 
>> wrote:
>>
>>> Hi all,
>>>
>>> NumFOCUS has applied as an umbrella org for the inaugural Google Summer
>>> of Docs, and we're participating. We need 1-2 ideas and work them out very
>>> well (ideas need to be high quality, it's not yet sure that NumFOCUS will
>>> get accepted as an umbrella org). Guidelines are at
>>> https://developers.google.com/season-of-docs/docs/project-ideas#project-idea
>>>
>>> Any suggestions for ideas?
>>>
>>
>> I have added an idea and page on the NumFOCUS GSoD site in
>> https://github.com/numfocus/gsod/pull/4. That also has some content that
>> we may want to reuse for a better contributor guide.
>>
>
> Looks like we are applying as a project and not under the NumFOCUS
> umbrella (we got some feedback that this would give a better chance of
> success). I can do the submission. We need a second name to be a GSoD admin
> though - does anyone want to volunteer (there should be no actual work
> involved).
>
> Thanks,
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pytest, fixture and parametrize

2018-08-08 Thread Evgeni Burovski

This is actually available from both nose (old-style np.testing framework),
and pytest. Stdlib unittest supports self.assertRaises context manager from
python 3.1

On Wed, Aug 8, 2018, 7:30 PM Eric Wieser 
wrote:

> Is another nice feature
>
> You forget that we already have that feature in our testing layer,
>
> with np.testing.assert_raises(AnException):
> pass
>
> 
>
> On Wed, 8 Aug 2018 at 09:08 Chris Barker - NOAA Federal <
> chris.bar...@noaa.gov> wrote:
>
>> BTW:
>>
>> with pytest.raises(AnException):
>> 
>>
>> Is another nice feature.
>>
>> -CHB
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-06-29 Thread Evgeni Burovski


>  Convert the doctest in `numpy/lib/tests/test_polynomial.py` to regular
tests. Might be tricky as it mostly checks print formatting.

Port scipy's refguide-check and enhance/fix/improve code examples in
docstrings?
Also somewhat janitorial though.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

41 matches

Mail list logo