[Numpy-discussion] numpy.copyto alternative for previous versions than 1.7.0 ?

2014-04-11 Thread techaddict
Is there a alternative way to mimic the same behaviour like numpy.copyto in
previous versions of numpy ?



--
View this message in context: 
http://numpy-discussion.10968.n7.nabble.com/numpy-copyto-alternative-for-previous-versions-than-1-7-0-tp37282.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.copyto alternative for previous versions than 1.7.0 ?

2014-04-11 Thread techaddict
Like how do i convert these to previous versions ?

copyto(ndarray(shape=[length], buffer=ba, offset=16, dtype=float64), v)
and
copyto(ndarray(shape=[rows, cols], buffer=ba, offset=24, dtype=float64,
order='C'), m)



--
View this message in context: 
http://numpy-discussion.10968.n7.nabble.com/numpy-copyto-alternative-for-previous-versions-than-1-7-0-tp37282p37283.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.copyto alternative for previous versions than 1.7.0 ?

2014-04-11 Thread Sebastian Berg
On Fr, 2014-04-11 at 02:36 -0700, techaddict wrote:
 Like how do i convert these to previous versions ?
 
 copyto(ndarray(shape=[length], buffer=ba, offset=16, dtype=float64), v)
 and
 copyto(ndarray(shape=[rows, cols], buffer=ba, offset=24, dtype=float64,
 order='C'), m)
 
 

First thing that comes to mind is using plain indexing which on versions
with copyto basically ends up calling copyto anyway if I remember
correctly (and v is an array, otherwise logic may be slightly more
complex for indexing).

So just using:

arr_view = ndarray(shape=[length], buffer=ba, offset=16, dtype=float64)
arr_view[...] = v

should do the trick.

- Sebastian

 
 --
 View this message in context: 
 http://numpy-discussion.10968.n7.nabble.com/numpy-copyto-alternative-for-previous-versions-than-1-7-0-tp37282p37283.html
 Sent from the Numpy-discussion mailing list archive at Nabble.com.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Wiki page for building numerical stuff on Windows

2014-04-11 Thread Carl Kleffner
Hi,

a small correction: a recent octave for windows is here:
http://mxeoctave.osuv.de

see http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38124 ...
Binary of octave 3.8.0 on windows is now prepared in voluntary contribution
by Markus Bergholz.

a discussion about OpenBLAS on the octave maintainer list:
http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38746

Regards

Carl



2014-04-11 5:46 GMT+02:00 Matthew Brett matthew.br...@gmail.com:

 Hi,

 On Thu, Apr 10, 2014 at 8:10 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Matthew Brett matthew.br...@gmail.com wrote:
  Hi,
 
  I've been working on a general wiki page on building numerical stuff on
 Windows:
 
  https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows
 
  I'm hoping to let Microsoft know what problems we're having, and
  seeing whether we numericists can share some work - us and R and Julia
  and Octave and so on.
 
  Feedback / edits very - very - welcome,
 
 
  Is seems Microsoft is working on an accelerate framework on their own.
 
  https://ampblas.codeplex.com
  https://amplapack.codeplex.com
 
  https://ampfft.codeplex.com
  https://amprng.codeplex.com
  https://ampalgorithms.codeplex.com
 
  It seems to be written in C++ and require VS2012 to build, and possibly
  DirectX11 to run.

 For ampblas : https://ampblas.codeplex.com/SourceControl/latest#readme.txt

 
 This library contains an adaptation of the legacy cblas interface to BLAS
 for
 C++ AMP. At this point almost all interfaces are not implemented. One
 exception is the ampblas_saxpy and ampblas_daxpy which serve as a
 template for the
 implementation of other routines.
 

 Last commit appears to be October 2012 :
 https://ampblas.codeplex.com/SourceControl/list/changesets

 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Wiki page for building numerical stuff on Windows

2014-04-11 Thread Sturla Molden
Matthew Brett matthew.br...@gmail.com wrote:

 
 This library contains an adaptation of the legacy cblas interface to BLAS for
 C++ AMP. At this point almost all interfaces are not implemented. One
 exception is the ampblas_saxpy and ampblas_daxpy which serve as a
 template for the
 implementation of other routines.
 

Right, so they gave up.

By the way, it seems we have forgotten an important Fortran compiler for
Windows: Portland Group.

pgroup.com


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Nathaniel Smith
On Fri, Apr 11, 2014 at 12:21 PM, Carl Kleffner cmkleff...@gmail.com wrote:
 a discussion about OpenBLAS on the octave maintainer list:
 http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38746

I'm getting the impression that OpenBLAS is being both a tantalizing
opportunity and a practical thorn-in-the-side for everyone -- Python,
Octave, Julia, R.

How crazy would it be to get together an organized effort to fix this
problem -- OpenBLAS for everyone? E.g., by collecting patches to fix
the bits we don't like (like unhelpful build system defaults),
applying more systematic QA, etc. Ideally this could be done upstream,
but if upstream is MIA or disagrees about OpenBLAS's goals, then it
could be maintained as a kind of OpenBLAS++ that merges regularly
from upstream (compare to [1][2][3] for successful projects handled in
this way). If hardware for testing is a problem, then I suspect
NumFOCUS would be overjoyed to throw a few kilodollars at buying one
instance of each widely-distributed microarchitecture released in the
last few years as a test farm...

I think the goal is pretty clear: a modern optionally-multithreaded
BLAS under a BSD-like license with a priority on correctness,
out-of-the-box functionality (like runtime configurability and feature
detection), speed, and portability, in that order.

I unfortunately don't have the skills to actually lead such an effort
(I've never written a line of asm in my life...), but surely our
collective communities have people who do?

-n

[1] http://www.openssh.com/portable.html
[2] http://www.eglibc.org/mission (a friendly fork of glibc holding
stuff that Ulrich Drepper got cranky about, which eventually was
merged back)
[3] https://en.wikipedia.org/wiki/Go-oo

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Sturla Molden
Nathaniel Smith n...@pobox.com wrote:

 I unfortunately don't have the skills to actually lead such an effort
 (I've never written a line of asm in my life...), but surely our
 collective communities have people who do?

The assembly part in OpenBLAS/GotoBLAS is the major problem. Not just that
it's ATT syntax (i.e. it requires MinGW to build on Windows), but also
that it sopports a wide range of processors. We just need a fast BLAS we
can use on Windows binary wheels (and possibly Mac OS X). There is no need
to support anything else than x86 and AMD64 architectures. So in theory one
could throw out all assembly and rewrite the kernels with compiler
intrinsics for various SIMD architectures. Or one just rely on the compiler
to autovectorize. Just program the code so it is easily vectorized. If we
manually unroll loops properly, and make sure the compiler is hinted about
memory alignment and pointer aliasing, the compiler will know what to do. 

There is already a reference BLAS implementation at Netlib, which we could
translate to C and optimize for SIMD. Then we  need a fast threadpool. I
have one I can donate, or we could use libxdispatch (a port of Apple's
libdispatch, aka GCD, to Windows as Linux.) Even Intel could not make their
TBB perform better than libdispatch. And that's about what we need. Or we
could start with OpenBLAS and throw away everything we don't need. 

Making a totally new BLAS might seem like a crazy idea, but it might be the
best solution in the long run. 


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Sturla Molden
Sturla Molden sturla.mol...@gmail.com wrote:

 Making a totally new BLAS might seem like a crazy idea, but it might be the
 best solution in the long run. 

To see if this can be done, I'll try to re-implement cblas_dgemm and then
benchmark against MKL, Accelerate and OpenBLAS. If I can get the
performance better than 75% of their speed, without any assembly or dark
magic, just plain C99 compiled with Intel icc, that would be sufficient for
binary wheels on Windows I think.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Wiki page for building numerical stuff on Windows

2014-04-11 Thread Matthew Brett
Hi,

On Fri, Apr 11, 2014 at 5:31 AM, Sturla Molden sturla.mol...@gmail.com wrote:
 Matthew Brett matthew.br...@gmail.com wrote:

 
 This library contains an adaptation of the legacy cblas interface to BLAS for
 C++ AMP. At this point almost all interfaces are not implemented. One
 exception is the ampblas_saxpy and ampblas_daxpy which serve as a
 template for the
 implementation of other routines.
 

 Right, so they gave up.

 By the way, it seems we have forgotten an important Fortran compiler for
 Windows: Portland Group.

 pgroup.com

Thanks for reminding me, I've put that in.

Man, they have an awful license, making it quite useless for
open-source: http://www.pgroup.com/doc/LICENSE.txt

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Wiki page for building numerical stuff on Windows

2014-04-11 Thread Sturla Molden
Matthew Brett matthew.br...@gmail.com wrote:

 Man, they have an awful license, making it quite useless for
 open-source: http://www.pgroup.com/doc/LICENSE.txt

Awful, and insanely expensive. :-(

And if you look at ACML, you will find that the MSVC compatible version is
built with the PG compiler. (There is an Intel ifort version too, but the
PG version is the only one that actually works.) So if you want ACML,
beware that it is tainted with a PG license on the runtime libraries.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Wiki page for building numerical stuff on Windows

2014-04-11 Thread Matthew Brett
Hi,

On Fri, Apr 11, 2014 at 4:21 AM, Carl Kleffner cmkleff...@gmail.com wrote:
 Hi,

 a small correction: a recent octave for windows is here:
 http://mxeoctave.osuv.de

 see http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38124 ...
 Binary of octave 3.8.0 on windows is now prepared in voluntary contribution
 by Markus Bergholz.

 a discussion about OpenBLAS on the octave maintainer list:
 http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38746

Thanks - I've put those corrections in as well.

Can you edit the page by the way?  I'm assuming y'all can...

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Wiki page for building numerical stuff on Windows

2014-04-11 Thread Aron Ahmadia
Thanks Matthew for putting this page together.

The OpenBLAS guys have been accepting/merging pull requests (their GitHub
tree shows 26 contributors and no open pull requests), and I know that
several people from the Python and Julia community have gotten pull
requests merged.  I modified your comments in the Wiki slightly, feel free
to revert if inappopriate.

A


On Fri, Apr 11, 2014 at 2:10 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Fri, Apr 11, 2014 at 4:21 AM, Carl Kleffner cmkleff...@gmail.com
 wrote:
  Hi,
 
  a small correction: a recent octave for windows is here:
  http://mxeoctave.osuv.de
 
  see http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38124 ...
  Binary of octave 3.8.0 on windows is now prepared in voluntary
 contribution
  by Markus Bergholz.
 
  a discussion about OpenBLAS on the octave maintainer list:
  http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38746

 Thanks - I've put those corrections in as well.

 Can you edit the page by the way?  I'm assuming y'all can...

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Julian Taylor
On 11.04.2014 18:03, Nathaniel Smith wrote:
 On Fri, Apr 11, 2014 at 12:21 PM, Carl Kleffner cmkleff...@gmail.com wrote:
 a discussion about OpenBLAS on the octave maintainer list:
 http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38746
 
 I'm getting the impression that OpenBLAS is being both a tantalizing
 opportunity and a practical thorn-in-the-side for everyone -- Python,
 Octave, Julia, R.
 
 How crazy would it be to get together an organized effort to fix this
 problem -- OpenBLAS for everyone? E.g., by collecting patches to fix
 the bits we don't like (like unhelpful build system defaults),
 applying more systematic QA, etc. Ideally this could be done upstream,
 but if upstream is MIA or disagrees about OpenBLAS's goals, then it
 could be maintained as a kind of OpenBLAS++ that merges regularly
 from upstream (compare to [1][2][3] for successful projects handled in
 this way). If hardware for testing is a problem, then I suspect
 NumFOCUS would be overjoyed to throw a few kilodollars at buying one
 instance of each widely-distributed microarchitecture released in the
 last few years as a test farm...
 

x86 cpus are backward compatible with almost all instructions they ever
introduced, so one machine with the latest instruction set supported is
sufficient to test almost everything.
For that the runtime kernel selection must be tuneable via the
environment so you can use kernels intended for older cpus.

The larger issue is finding a good and thorough testsuite that wasn't
written 30 years ago and thus does covers problem sizes larger than a
few megabytes. These are the problem sizes are that often crashed
openblas in the past.
Isn't there a kind of comprehensive BLAS verification testsuite which
all BLAS implementations should test against and contribute to available
somewhere?
E.g. like the POSIX compliance testsuite.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Matthew Brett
Hi,

On Fri, Apr 11, 2014 at 9:03 AM, Nathaniel Smith n...@pobox.com wrote:
 On Fri, Apr 11, 2014 at 12:21 PM, Carl Kleffner cmkleff...@gmail.com wrote:
 a discussion about OpenBLAS on the octave maintainer list:
 http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38746

 I'm getting the impression that OpenBLAS is being both a tantalizing
 opportunity and a practical thorn-in-the-side for everyone -- Python,
 Octave, Julia, R.

 How crazy would it be to get together an organized effort to fix this
 problem -- OpenBLAS for everyone? E.g., by collecting patches to fix
 the bits we don't like (like unhelpful build system defaults),
 applying more systematic QA, etc. Ideally this could be done upstream,
 but if upstream is MIA or disagrees about OpenBLAS's goals, then it
 could be maintained as a kind of OpenBLAS++ that merges regularly
 from upstream (compare to [1][2][3] for successful projects handled in
 this way). If hardware for testing is a problem, then I suspect
 NumFOCUS would be overjoyed to throw a few kilodollars at buying one
 instance of each widely-distributed microarchitecture released in the
 last few years as a test farm...

 I think the goal is pretty clear: a modern optionally-multithreaded
 BLAS under a BSD-like license with a priority on correctness,
 out-of-the-box functionality (like runtime configurability and feature
 detection), speed, and portability, in that order.

It sounds like a joint conversation with R, Julia, Octave team at
least would be useful here,

Anyone volunteer for starting that conversation?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Julian Taylor
On 11.04.2014 19:05, Sturla Molden wrote:
 Sturla Molden sturla.mol...@gmail.com wrote:
 
 Making a totally new BLAS might seem like a crazy idea, but it might be the
 best solution in the long run. 
 
 To see if this can be done, I'll try to re-implement cblas_dgemm and then
 benchmark against MKL, Accelerate and OpenBLAS. If I can get the
 performance better than 75% of their speed, without any assembly or dark
 magic, just plain C99 compiled with Intel icc, that would be sufficient for
 binary wheels on Windows I think.
 


hi,
if you can, also give gcc with graphite a try. Its loop transformations
should give you similar results as manual blocking if the compiler is
able to understand the loop, see
http://gcc.gnu.org/gcc-4.4/changes.html
-floop-strip-mine
-floop-block
-floop-interchange
+ a couple options to tune the parameters

you may need gcc-4.8 for it to work properly on not compile time fixed
loop iteration counts.
So far i know clang/llvm also has graphite integration.

Cheers,
Julian
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Wiki page for building numerical stuff on Windows

2014-04-11 Thread Matthew Brett
Hi,

On Fri, Apr 11, 2014 at 11:26 AM, Aron Ahmadia a...@ahmadia.net wrote:
 Thanks Matthew for putting this page together.

 The OpenBLAS guys have been accepting/merging pull requests (their GitHub
 tree shows 26 contributors and no open pull requests), and I know that
 several people from the Python and Julia community have gotten pull requests
 merged.  I modified your comments in the Wiki slightly, feel free to revert
 if inappopriate.

Excellent - thanks for that,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Nathaniel Smith
On Fri, Apr 11, 2014 at 6:05 PM, Sturla Molden sturla.mol...@gmail.com wrote:
 Sturla Molden sturla.mol...@gmail.com wrote:

 Making a totally new BLAS might seem like a crazy idea, but it might be the
 best solution in the long run.

 To see if this can be done, I'll try to re-implement cblas_dgemm and then
 benchmark against MKL, Accelerate and OpenBLAS. If I can get the
 performance better than 75% of their speed, without any assembly or dark
 magic, just plain C99 compiled with Intel icc, that would be sufficient for
 binary wheels on Windows I think.

Sounds like a worthwhile experiment!

My suspicion is that it we'll be better off starting with something
that is almost good enough (OpenBLAS) and then incrementally improving
it to meet our needs, rather than starting from scratch -- there's a
*long* way to get from dgemm to a fully supported BLAS project -- but
no matter what it'll generate useful data, and possibly some useful
code that could either be the basis of something new or integrated
into whatever we do end up doing.

Also, while Windows is maybe in the worst shape, all platforms would
seriously benefit from the existence of a reliable speed-competitive
binary-distribution-compatible BLAS that doesn't break fork().

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Julian Taylor
On 11.04.2014 18:03, Nathaniel Smith wrote:
 On Fri, Apr 11, 2014 at 12:21 PM, Carl Kleffner cmkleff...@gmail.com wrote:
 a discussion about OpenBLAS on the octave maintainer list:
 http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38746
 
 I'm getting the impression that OpenBLAS is being both a tantalizing
 opportunity and a practical thorn-in-the-side for everyone -- Python,
 Octave, Julia, R.
 

does anyone have experience with BLIS?
https://code.google.com/p/blis/
https://github.com/flame/blis

from the description it looks interesting and its BSD licensed.
though windows support is experimental according to the FAQ.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Wiki page for building numerical stuff on Windows

2014-04-11 Thread Matthew Brett
Hi,

On Fri, Apr 11, 2014 at 10:49 AM, Sturla Molden sturla.mol...@gmail.com wrote:
 Matthew Brett matthew.br...@gmail.com wrote:

 Man, they have an awful license, making it quite useless for
 open-source: http://www.pgroup.com/doc/LICENSE.txt

 Awful, and insanely expensive. :-(

 And if you look at ACML, you will find that the MSVC compatible version is
 built with the PG compiler. (There is an Intel ifort version too, but the
 PG version is the only one that actually works.) So if you want ACML,
 beware that it is tainted with a PG license on the runtime libraries.

The ACML license does not include those terms, unless I missed them:

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/ACML_June_24_2010_v2.pdf

I assume that AMD negotiated to release themselves from full terms of
the PG license, but if anyone knows differently, please say...

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Nathaniel Smith
On Fri, Apr 11, 2014 at 7:53 PM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 On 11.04.2014 18:03, Nathaniel Smith wrote:
 On Fri, Apr 11, 2014 at 12:21 PM, Carl Kleffner cmkleff...@gmail.com wrote:
 a discussion about OpenBLAS on the octave maintainer list:
 http://article.gmane.org/gmane.comp.gnu.octave.maintainers/38746

 I'm getting the impression that OpenBLAS is being both a tantalizing
 opportunity and a practical thorn-in-the-side for everyone -- Python,
 Octave, Julia, R.


 does anyone have experience with BLIS?
 https://code.google.com/p/blis/
 https://github.com/flame/blis

Also:

Does BLIS automatically detect my hardware?

Not yet. For now, BLIS requires the user/developer to manually specify
an existing configuration that corresponds to the hardware for which
to build a BLIS library.

So for now, BLIS is mostly a developer's tool?

Yes. In order to achieve high performance, BLIS requires that
hand-coded kernels and micro-kernels be written and referenced in a
valid BLIS configuration. These components are usually written by
developers and then included within BLIS for use by others.

If high performance is not important, then you can always build the
reference implementation on any hardware platform. The reference
implementation does not contain any machine-specific code and thus
should be very portable.

Does BLIS support multithreading?

BLIS does not yet implement multithreaded versions of its operations.
However, BLIS can very easily be made thread-safe so that you can call
BLIS from threads[...]

Can I build BLIS as a shared library?

The BLIS build system is not yet capable of outputting a shared library. [...]

https://code.google.com/p/blis/wiki/FAQ

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Matthew Brett
Hi,

On Fri, Apr 11, 2014 at 10:05 AM, Sturla Molden sturla.mol...@gmail.com wrote:
 Sturla Molden sturla.mol...@gmail.com wrote:

 Making a totally new BLAS might seem like a crazy idea, but it might be the
 best solution in the long run.

 To see if this can be done, I'll try to re-implement cblas_dgemm and then
 benchmark against MKL, Accelerate and OpenBLAS. If I can get the
 performance better than 75% of their speed, without any assembly or dark
 magic, just plain C99 compiled with Intel icc, that would be sufficient for
 binary wheels on Windows I think.

Did you check out the Intel license though?

http://software.intel.com/sites/default/files/managed/95/23/Intel_SW_Dev_Products__EULA.pdf

D. DISTRIBUTION: Distribution of the Redistributables is also subject
to the following limitations: You
(i) shall be solely responsible to your customers for any update or
support obligation or other liability
which may arise from the distribution, (ii) shall not make any
statement that your product is
certified, or that its performance is guaranteed, by Intel, (iii)
shall not use Intel's name or
trademarks to market your product without written permission, (iv)
shall use a license agreement
that prohibits disassembly and reverse engineering of the
Redistributables, (v) shall indemnify, hold
harmless, and defend Intel and its suppliers from and against any
claims or lawsuits, including
attorney's fees, that arise or result from your distribution of any product.

Are you sure that you can redistribute object code statically linked
against icc runtimes?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Sturla Molden
On 11/04/14 23:11, Matthew Brett wrote:

 Are you sure that you can redistribute object code statically linked
 against icc runtimes?

I am not a lawyer...




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Matthew Brett
On Fri, Apr 11, 2014 at 2:58 PM, Sturla Molden sturla.mol...@gmail.com wrote:
 On 11/04/14 23:11, Matthew Brett wrote:

 Are you sure that you can redistribute object code statically linked
 against icc runtimes?

 I am not a lawyer...

No - sure - but it would be frustrating if you found yourself
optimizing with a compiler that is useless for subsequent open-source
builds.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Sturla Molden
On 12/04/14 00:01, Matthew Brett wrote:

 No - sure - but it would be frustrating if you found yourself
 optimizing with a compiler that is useless for subsequent open-source
 builds.

No, I think MSVC or gcc 4.8/4.9 will work too. It's just that I happen 
to have icc and clang on this computer :)

Sturla



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Wiki page for building numerical stuff on Windows

2014-04-11 Thread Sturla Molden
On 11/04/14 04:44, Matthew Brett wrote:

 I've been working on a general wiki page on building numerical stuff on 
 Windows:

 https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows

I am worried that the conclusion will be that there is no viable BLAS 
alternative on Windows...


Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-11 Thread Sankarshan Mudkavi
So is the consensus that we don't accept any tags at all (not even 
temporarily)? Would that break too much existing code?

Cheers,
Sankarshan

On Apr 1, 2014, at 2:50 PM, Alexander Belopolsky ndar...@mac.com wrote:

 
 On Tue, Apr 1, 2014 at 1:12 PM, Nathaniel Smith n...@pobox.com wrote:
 In [6]: a[0] = garbage
 ValueError: could not convert string to float: garbage
 
 (Cf, Errors should never pass silently.) Any reason why datetime64
 should be different?
 
 datetime64 is different because it has NaT support from the start.  NaN 
 support for floats seems to be an afterthought if not an accident of 
 implementation.
 
 And it looks like some errors do pass silently:
 
  a[0] = 1
 # not a TypeError
 
 But I withdraw my suggestion.  The closer datetime64 behavior is to numeric 
 types the better.
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
Sankarshan Mudkavi
Undergraduate in Physics, University of Waterloo
www.smudkavi.com








signature.asc
Description: Message signed with OpenPGP using GPGMail
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Sturla Molden
On 11/04/14 20:47, Nathaniel Smith wrote:

 Also, while Windows is maybe in the worst shape, all platforms would
 seriously benefit from the existence of a reliable speed-competitive
 binary-distribution-compatible BLAS that doesn't break fork().

Windows is worst off, yes.

I don't think fork breakage by Accelerate is a big problem on Mac OS X. 
Apple has made clear that only POSIX APIs are fork safe. And actually 
this is now recognized as an error in multiprocessing and fixed in 
Python 3.4:

multiprocessing.set_start_method('spawn')

On Linux the distributions will usually ship with prebuilt ATLAS.


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-11 Thread Nathaniel Smith
On Fri, Apr 11, 2014 at 11:25 PM, Sankarshan Mudkavi
smudk...@uwaterloo.ca wrote:
 So is the consensus that we don't accept any tags at all (not even
 temporarily)? Would that break too much existing code?

Well, we don't know. If anyone has any ideas on how to figure it out
then they should speak up :-).

Barring any brilliant suggestions though, I suggest we just go ahead
with disallowing all timezone tags for now. We can always change our
mind as we get closer to the release and people start experimenting
with the new code.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Nathaniel Smith
On Fri, Apr 11, 2014 at 11:26 PM, Sturla Molden sturla.mol...@gmail.com wrote:
 On 11/04/14 20:47, Nathaniel Smith wrote:

 Also, while Windows is maybe in the worst shape, all platforms would
 seriously benefit from the existence of a reliable speed-competitive
 binary-distribution-compatible BLAS that doesn't break fork().

 Windows is worst off, yes.

 I don't think fork breakage by Accelerate is a big problem on Mac OS X.
 Apple has made clear that only POSIX APIs are fork safe. And actually
 this is now recognized as an error in multiprocessing and fixed in
 Python 3.4:

 multiprocessing.set_start_method('spawn')

I don't really care whether it's *documented* that BLAS and fork are
incompatible. I care whether it *works*, because it is useful
functionality :-).

The spawn mode is fine and all, but (a) the presence of something in
3.4 helps only a minority of users, (b) spawn is not a full
replacement for fork; with large read-mostly data sets it can be a
*huge* win to load them into the parent process and then let them be
COW-inherited by forked children. ATM the only other way to work with
a data set that's larger than memory-divided-by-numcpus is to
explicitly set up shared memory, and this is *really* hard for
anything more complicated than a single flat array.

 On Linux the distributions will usually ship with prebuilt ATLAS.

And it's generally recommended that everyone rebuild their own ATLAS
anyway. I can do it, but I'd much rather be able to install a BLAS
library that just worked. (Presumably this is a large part of why
scipy-stack distributors prefer MKL over ATLAS.)

If it comes down to it then of course I'd rather have a Windows-only
BLAS than no BLAS at all. I just don't think we should be setting our
sights so low at this point. The marginal cost of portability doesn't
seem high.

Besides, even Windows users will benefit more from having a standard
cross-platform BLAS that everyone uses -- it would mean lots more
people familiar with the library's quirks, better testing, etc.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-11 Thread Charles R Harris
On Fri, Apr 11, 2014 at 4:25 PM, Sankarshan Mudkavi
smudk...@uwaterloo.cawrote:

 So is the consensus that we don't accept any tags at all (not even
 temporarily)? Would that break too much existing code?

 Cheers,
 Sankarshan

 On Apr 1, 2014, at 2:50 PM, Alexander Belopolsky ndar...@mac.com wrote:


 On Tue, Apr 1, 2014 at 1:12 PM, Nathaniel Smith n...@pobox.com wrote:

 In [6]: a[0] = garbage
 ValueError: could not convert string to float: garbage

 (Cf, Errors should never pass silently.) Any reason why datetime64
 should be different?


 datetime64 is different because it has NaT support from the start.  NaN
 support for floats seems to be an afterthought if not an accident of
 implementation.

 And it looks like some errors do pass silently:

  a[0] = 1
 # not a TypeError

 But I withdraw my suggestion.  The closer datetime64 behavior is to
 numeric types the better.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



Are we in a position to start looking at implementation? If so, it would be
useful to have a collection of test cases, i.e., typical uses with
specified results. That should also cover conversion from/(to?)
datetime.datetime.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Sturla Molden
On 12/04/14 00:39, Nathaniel Smith wrote:

 The spawn mode is fine and all, but (a) the presence of something in
 3.4 helps only a minority of users, (b) spawn is not a full
 replacement for fork;

It basically does the same as on Windows. If you want portability to 
Windows, you must abide by these restrictions anyway.


 with large read-mostly data sets it can be a
 *huge* win to load them into the parent process and then let them be
 COW-inherited by forked children.

The thing is that Python reference counts breaks COW fork. This has been 
discussed several times on the Python-dev list. What happens is that as 
soon as the child process updates a refcount, the OS copies the page. 
And because of how Python behaves, this copying of COW-marked pages 
quickly gets excessive. Effectively the performance of os.fork in Python 
will close to a non-COW fork. A suggested solution is to move the 
refcount out of the PyObject struct, and perhaps keep them in a 
dedicated heap. But doing so will be unfriendly to cache.


 ATM the only other way to work with
 a data set that's larger than memory-divided-by-numcpus is to
 explicitly set up shared memory, and this is *really* hard for
 anything more complicated than a single flat array.


Not difficult. You just go to my GitHub site and grab the code ;)

(I have some problems running it on my MBP though, not sure why, but it 
used to work on Linux and Windows, and possibly still does.)

https://github.com/sturlamolden/sharedmem-numpy


Sturla






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Nathaniel Smith
On Sat, Apr 12, 2014 at 12:07 AM, Sturla Molden sturla.mol...@gmail.com wrote:
 On 12/04/14 00:39, Nathaniel Smith wrote:

 The spawn mode is fine and all, but (a) the presence of something in
 3.4 helps only a minority of users, (b) spawn is not a full
 replacement for fork;

 It basically does the same as on Windows. If you want portability to
 Windows, you must abide by these restrictions anyway.

Yes, but sorry Unix guys, we've decided to take away this nice
feature from you because it doesn't work on Windows is a really
terrible argument. If it can't be made to work, then fine, but fork
safety is just not *that* much to ask.

 with large read-mostly data sets it can be a
 *huge* win to load them into the parent process and then let them be
 COW-inherited by forked children.

 The thing is that Python reference counts breaks COW fork. This has been
 discussed several times on the Python-dev list. What happens is that as
 soon as the child process updates a refcount, the OS copies the page.
 And because of how Python behaves, this copying of COW-marked pages
 quickly gets excessive. Effectively the performance of os.fork in Python
 will close to a non-COW fork. A suggested solution is to move the
 refcount out of the PyObject struct, and perhaps keep them in a
 dedicated heap. But doing so will be unfriendly to cache.

Yes, it's limited, but again this is not a reason to break it in the
cases where it *does* work. The case where I ran into this was loading
a big language model using SRILM:
  http://www.speech.sri.com/projects/srilm/
  https://github.com/njsmith/pysrilm
This produces a single Python object that references an opaque,
tens-of-gigabytes mess of C++ objects. For this case explicit shared
mem is useless, but fork worked brilliantly.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Nathaniel Smith
On Fri, Apr 11, 2014 at 7:29 PM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 x86 cpus are backward compatible with almost all instructions they ever
 introduced, so one machine with the latest instruction set supported is
 sufficient to test almost everything.
 For that the runtime kernel selection must be tuneable via the
 environment so you can use kernels intended for older cpus.

Overriding runtime kernel selection sounds like a good bite-sized
feature that could be added to OpenBLAS...

 The larger issue is finding a good and thorough testsuite that wasn't
 written 30 years ago and thus does covers problem sizes larger than a
 few megabytes. These are the problem sizes are that often crashed
 openblas in the past.
 Isn't there a kind of comprehensive BLAS verification testsuite which
 all BLAS implementations should test against and contribute to available
 somewhere?
 E.g. like the POSIX compliance testsuite.

I doubt it! Someone could make a good start on one in an afternoon
though. (Only a start, but half a test suite is heck of a lot better
than nothing.)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Sturla Molden
On 12/04/14 01:07, Sturla Molden wrote:

 ATM the only other way to work with
 a data set that's larger than memory-divided-by-numcpus is to
 explicitly set up shared memory, and this is *really* hard for
 anything more complicated than a single flat array.


 Not difficult. You just go to my GitHub site and grab the code ;)

 (I have some problems running it on my MBP though, not sure why, but it
 used to work on Linux and Windows, and possibly still does.)

 https://github.com/sturlamolden/sharedmem-numpy

Hmm, today it works fine on my MBP too... Good. :)


import multiprocessing as mp
import numpy as np
import sharedmem as shm

def proc(qin, qout):
 print(grabbing array from queue)
 a = qin.get()
 print(a)
 print(putting array in queue)
 b = shm.zeros(10)
 print(b)
 qout.put(b)
 print(waiting for array to be updated by another process)
 a = qin.get()
 print(b)

if __name__ == __main__:
 qin = mp.Queue()
 qout = mp.Queue()
 p = mp.Process(target=proc, args=(qin,qout))
 p.start()
 a = shm.zeros(4)
 qin.put(a)
 b = qout.get()
 b[:] = range(10)
 qin.put(None)
 p.join()

sturla$ python example.py
grabbing array from queue
[ 0.  0.  0.  0.]
putting array in queue
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
waiting for array to be updated by another process
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]


Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows)

2014-04-11 Thread Nathaniel Smith
Okay, I started taking notes here:
  https://github.com/numpy/numpy/wiki/BLAS-desiderata

Please add as appropriate...

-n

On Sat, Apr 12, 2014 at 12:19 AM, Nathaniel Smith n...@pobox.com wrote:
 On Fri, Apr 11, 2014 at 7:29 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
 x86 cpus are backward compatible with almost all instructions they ever
 introduced, so one machine with the latest instruction set supported is
 sufficient to test almost everything.
 For that the runtime kernel selection must be tuneable via the
 environment so you can use kernels intended for older cpus.

 Overriding runtime kernel selection sounds like a good bite-sized
 feature that could be added to OpenBLAS...

 The larger issue is finding a good and thorough testsuite that wasn't
 written 30 years ago and thus does covers problem sizes larger than a
 few megabytes. These are the problem sizes are that often crashed
 openblas in the past.
 Isn't there a kind of comprehensive BLAS verification testsuite which
 all BLAS implementations should test against and contribute to available
 somewhere?
 E.g. like the POSIX compliance testsuite.

 I doubt it! Someone could make a good start on one in an afternoon
 though. (Only a start, but half a test suite is heck of a lot better
 than nothing.)

 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org



-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-11 Thread Stephan Hoyer
On Fri, Apr 11, 2014 at 3:56 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:

 Are we in a position to start looking at implementation? If so, it would
 be useful to have a collection of test cases, i.e., typical uses with
 specified results. That should also cover conversion from/(to?)
 datetime.datetime.


Indeed, my personal wish-list for np.datetime64 is centered much more on
robust conversion to/from native date objects, including comparison.

Here are some of my particular points of frustration (apologies for the
thread jacking!):
- NaT should have similar behavior to NaN when used for comparisons (i.e.,
comparisons should always be False).
- You can't compare a datetime object to a datetime64 object.
- datetime64 objects with high precision (e.g., ns) can't compare to
datetime objects.

Pandas has a very nice wrapper around datetime64 arrays that solves most of
these issues, but it would be nice to get much of that functionality in
core numpy, since I don't always want to store my values in a 1-dimensional
array + hash-table (the pandas Index):
http://pandas.pydata.org/pandas-docs/stable/timeseries.html

Here's code which reproduces all of the above:

from numpy import datetime64
from datetime import datetime

print np.datetime64('NaT')  np.datetime64('2011-01-01') # this should not
to true
print datetime(2010, 1, 1)  np.datetime64('2011-01-01') # raises exception
print np.datetime64('2011-01-01T00:00', 'ns')  datetime(2010, 1, 1) #
another exception
print np.datetime64('2011-01-01T00:00')  datetime(2010, 1, 1) # finally
something works!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-11 Thread Alexander Belopolsky
On Fri, Apr 11, 2014 at 7:58 PM, Stephan Hoyer sho...@gmail.com wrote:

 print datetime(2010, 1, 1)  np.datetime64('2011-01-01') # raises exception


This is somewhat consistent with

 from datetime import *
 datetime(2010, 1, 1)  date(2010, 1, 1)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: can't compare datetime.datetime to datetime.date

but I would expect date(2010, 1, 1)  np.datetime64('2011-01-01') to return
False.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion