Re: [Numpy-discussion] __numpy_ufunc__

2014-07-16 Thread Nathaniel Smith
Weirdly, I never received Chuck's original email in this thread. Should
some list admin be informed?

I also am not sure what/where Julian's comments were, so I second the call
for context :-). Putting it off until 1.10 doesn't seem like an obviously
bad idea to me, but specifics would help...

(__numpy_ufunc__ is the new system for allowing arbitrary third party
objects to override how ufuncs are applied to them, i.e. it means
np.sin(sparsemat) and np.sin(gpuarray) can be defined to do something
sensible. Conceptually it replaces the old __array_prepare__/__array_wrap__
system, which was limited to ndarray subclasses and has major limits on
what you can do. Of course __array_prepare/wrap__ will also continue to be
supported for compatibility.)

-n
On 16 Jul 2014 00:10, Benjamin Root ben.r...@ou.edu wrote:

 Perhaps a bit of context might be useful? How is numpy_ufunc different
 from the ufuncs that we know and love? What are the known implications?
 What are the known shortcomings? Are there ABI and/or API concerns between
 1.9 and 1.10?

 Ben Root


 On Mon, Jul 14, 2014 at 2:22 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:

 Hi All,

 Julian has raised the question of including numpy_ufunc in numpy 1.9. I
 don't feel strongly one way or the other, but it doesn't seem to be
 finished yet and 1.10 might be a better place to work out the remaining
 problems along with the astropy folks testing possible uses.

 Thoughts?

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in np.cross for 2D vectors

2014-07-16 Thread Jaime Fernández del Río
On Tue, Jul 15, 2014 at 2:22 AM, Neil Hodgson hodgson.n...@yahoo.co.uk
wrote:

 Hi,

 We came across this bug while using np.cross on 3D arrays of 2D vectors.


What version of numpy are you using? This should already be solved in numpy
master, and be part of the 1.9 release. Here's the relevant commit,
although the code has been cleaned up a bit in later ones:

https://github.com/numpy/numpy/commit/b9454f50f23516234c325490913224c3a69fb122

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String type again.

2014-07-16 Thread Aldcroft, Thomas
On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith n...@pobox.com wrote:

 On 12 Jul 2014 23:06, Charles R Harris charlesr.har...@gmail.com
 wrote:
 
  As previous posts have pointed out, Numpy's `S` type is currently
 treated as a byte string, which leads to more complicated code in python3.
 OTOH, the unicode type is stored as UCS4, which consumes a lot of space,
 especially for ascii strings. This note proposes to adapt the currently
 existing 'a' type letter, currently aliased to 'S', as a new fixed encoding
 dtype. Python 3.3 introduced two one byte internal representations for
 unicode strings, ascii and latin1. Ascii has the advantage that it is a
 subset of UTF-8, whereas latin1 has a few more symbols. Another possibility
 is to just make it an UTF-8 encoding, but I think this would involve more
 overhead as Python would need to determine the maximum character size.
 These are just preliminary thoughts, comments are welcome.

 I feel like for most purposes, what we *really* want is a variable length
 string dtype (I.e., where each element can be a different length.). Pandas
 pays quite some price in overhead to fake this right now. Adding such a
 thing will cause some problems regarding compatibility (what to do with
 array([foo])) and education, but I think it's worth it in the long run. A
 variable length string with out of band storage also would allow for a lot
 of py3.3-style storage tricks of we want then.

 Given that, though, I'm a little dubious about adding a third fixed length
 string type, since it seems like it might be a temporary patch, yet raises
 the prospect of having to indefinitely support *5* distinct string types (3
 of which will map to py3 str)...

 OTOH, fixed length nul padded latin1 would be useful for various flat file
 reading tasks.

As one of the original agitators for this, let me re-iterate that what the
astronomical community *really* wants is the original proposal as described
by Chris Barker [1] and essentially what Charles said.  We have large data
archives that have ASCII string data in binary formats like FITS and HDF5.
 The current readers for those datasets present users with numpy S data
types, which in Python 3 cannot be compared to str (unicode) literals.  In
many cases those datasets are large, and in my case I regularly deal with
multi-Gb sized bytestring arrays.  Converting those to a U dtype is not
practical.

This issue is the sole blocker that I personally have in beginning to move
our operations code base to be Python 3 compatible, and eventually actually
baselining Python 3.

A variable length string would be great, but it feels like a different (and
more difficult) problem to me.  If, however, this can be the solution to
the problem I described, and it can be implemented in a finite time, then
I'm all for it!  :-)

I hate begging for features with no chance of contributing much to the
implementation (lacking the necessary expertise in numpy internals).  I
would be happy to draft a NEP if that will help the process.

Cheers,
Tom

[1]:
http://mail.scipy.org/pipermail/numpy-discussion/2014-January/068622.html

 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] __numpy_ufunc__

2014-07-16 Thread Ralf Gommers
On Mon, Jul 14, 2014 at 8:22 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:

 Hi All,

 Julian has raised the question of including numpy_ufunc in numpy 1.9. I
 don't feel strongly one way or the other, but it doesn't seem to be
 finished yet and 1.10 might be a better place to work out the remaining
 problems along with the astropy folks testing possible uses.

 Thoughts?


It's already in, so do you mean not using? Would help to know what the
issue is, because it's finished enough that it's already used in a released
version of scipy (in sparse matrices).

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] `allclose` vs `assert_allclose`

2014-07-16 Thread Tony Yu
Is there any reason why the defaults for `allclose` and `assert_allclose`
differ? This makes debugging a broken test much more difficult. More
importantly, using an absolute tolerance of 0 causes failures for some
common cases. For example, if two values are very close to zero, a test
will fail:

np.testing.assert_allclose(0, 1e-14)

Git blame suggests the change was made in the following commit, but I guess
that change only reverted to the original behavior.

https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf

It seems like the defaults for  `allclose` and `assert_allclose` should
match, and an absolute tolerance of 0 is probably not ideal. I guess this
is a pretty big behavioral change, but the current default for
`assert_allclose` doesn't seem ideal.

Thanks,
-Tony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String type again.

2014-07-16 Thread Chris Barker
On Mon, Jul 14, 2014 at 10:39 AM, Andrew Collette andrew.colle...@gmail.com
 wrote:


 For storing data in HDF5 (PyTables or h5py), it would be somewhat
 cleaner if either ASCII or UTF-8 are used, as these are the only two
 charsets officially supported by the library.


good argument for ASCII, but utf-8 is a bad idea, as there is no 1:1
correspondence between length of string in bytes and length in characters
-- as numpy needs to pre-allocate a defined number of bytes for a dtype,
there is a disconnect between the user and numpy as to how long a string is
being stored...this isn't a problem for immutable strings, and less of a
problem for HDF, as you can determine how many bytes you need before you
write the file (or does HDF support var-length elements?)


  Latin-1 would require a
 custom read/write converter, which isn't the end of the world


custom? it would be an encoding operation -- which you'd need to go from
utf-8 to/from unicode anyway. So you would lose the ability to have a nice
1:1 binary representation map between numpy and HDF... good argument for
ASCII, I guess. Or for HDF to use latin-1 ;-)

Does HDF enforce ascii-only? what does it do with the  127 values?


 would be tricky to do in a correct way, and likely somewhat slow.
 We'd also run into truncation issues since certain latin-1 chars
 become multibyte sequences in UTF8.


that's the whole issue with UTF-8 -- it needs to be addressed somewhere,
and the numpy-HDF interface seems like a smarter place to put it than the
numpy-user interface!

I assume 'a' strings would still be null-padded?


yup.



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String type again.

2014-07-16 Thread Jeff Reback
in 0.15.0 pandas will have full fledged support for categoricals which in 
effect allow u 2 map a smaller number of strings to integers 

this is now in pandas master 

http://pandas-docs.github.io/pandas-docs-travis/categorical.html

feedback welcome!

 On Jul 14, 2014, at 1:00 PM, Olivier Grisel olivier.gri...@ensta.org wrote:
 
 2014-07-13 19:05 GMT+02:00 Alexander Belopolsky ndar...@mac.com:
 
 On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith n...@pobox.com wrote:
 
 I feel like for most purposes, what we *really* want is a variable length
 string dtype (I.e., where each element can be a different length.).
 
 
 
 I've been toying with the idea of creating an array type for interned
 strings.  In many applications dealing with large arrays of variable size
 strings, the strings come from a relatively short set of names.  Arrays of
 interned strings can be manipulated very efficiently because in may respects
 they are just like arrays of integers.
 
 +1 I think this is why pandas is using dtype=object to load string
 data: in many cases short string values are used to represent
 categorical variables with a comparatively small cardinality of
 possible values for a dataset with comparatively numerous records.
 
 In that case the dtype=object is not that bad as it just stores
 pointer on string objects managed by Python. It's possible to intern
 the strings manually at load time (I don't know if pandas or python
 already do it automatically in that case). The integer semantics is
 good for that case. Having an explicit dtype might be even better.
 
 -- 
 Olivier
 http://twitter.com/ogrisel - http://github.com/ogrisel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String type again.

2014-07-16 Thread Charles R Harris
On Tue, Jul 15, 2014 at 9:15 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:




 On Tue, Jul 15, 2014 at 5:26 AM, Sebastian Berg 
 sebast...@sipsolutions.net wrote:

 On Sa, 2014-07-12 at 12:17 -0500, Charles R Harris wrote:
  As previous posts have pointed out, Numpy's `S` type is currently
  treated as a byte string, which leads to more complicated code in
  python3. OTOH, the unicode type is stored as UCS4, which consumes a
  lot of space, especially for ascii strings. This note proposes to
  adapt the currently existing 'a' type letter, currently aliased to
  'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte
  internal representations for unicode strings, ascii and latin1. Ascii
  has the advantage that it is a subset of UTF-8, whereas latin1 has a
  few more symbols. Another possibility is to just make it an UTF-8
  encoding, but I think this would involve more overhead as Python would
  need to determine the maximum character size. These are just
  preliminary thoughts, comments are welcome.
 

 Just wondering, couldn't we have a type which actually has an
 (arbitrary, python supported) encoding (and bytes might even just be a
 special case of no encoding)? Basically storing bytes and on access do
 element[i].decode(specified_encoding) and on storing element[i] =
 value.encode(specified_encoding).

 There is always the never ending small issue of trailing null bytes. If
 we want to be fully compatible, such a type would have to store the
 string length explicitly to support trailing null bytes.


 UTF-8 encoding works with null bytes. That is one of the reasons it is so
 popular.


Thinking more about it, the easiest thing to do might be to make the S
dtype a UTF-8 encoding. Most of the machinery to deal with that is already
in place. That change might affect some users though, and we might need to
do some work to make it backwards compatible with python 2.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] __numpy_ufunc__

2014-07-16 Thread Benjamin Root
Perhaps a bit of context might be useful? How is numpy_ufunc different from
the ufuncs that we know and love? What are the known implications? What are
the known shortcomings? Are there ABI and/or API concerns between 1.9 and
1.10?

Ben Root


On Mon, Jul 14, 2014 at 2:22 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:

 Hi All,

 Julian has raised the question of including numpy_ufunc in numpy 1.9. I
 don't feel strongly one way or the other, but it doesn't seem to be
 finished yet and 1.10 might be a better place to work out the remaining
 problems along with the astropy folks testing possible uses.

 Thoughts?

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] __numpy_ufunc__

2014-07-16 Thread Ralf Gommers
On Wed, Jul 16, 2014 at 10:07 AM, Nathaniel Smith n...@pobox.com wrote:

 Weirdly, I never received Chuck's original email in this thread. Should
 some list admin be informed?

Also weirdly, my reply didn't show up on gmane. Not sure if it got through,
so re-sending:

It's already in, so do you mean not using? Would help to know what the
issue is, because it's finished enough that it's already used in a released
version of scipy (in sparse matrices).

Ralf

I also am not sure what/where Julian's comments were, so I second the call
 for context :-). Putting it off until 1.10 doesn't seem like an obviously
 bad idea to me, but specifics would help...

 (__numpy_ufunc__ is the new system for allowing arbitrary third party
 objects to override how ufuncs are applied to them, i.e. it means
 np.sin(sparsemat) and np.sin(gpuarray) can be defined to do something
 sensible. Conceptually it replaces the old __array_prepare__/__array_wrap__
 system, which was limited to ndarray subclasses and has major limits on
 what you can do. Of course __array_prepare/wrap__ will also continue to be
 supported for compatibility.)

-n
 On 16 Jul 2014 00:10, Benjamin Root ben.r...@ou.edu wrote:

 Perhaps a bit of context might be useful? How is numpy_ufunc different
 from the ufuncs that we know and love? What are the known implications?
 What are the known shortcomings? Are there ABI and/or API concerns between
 1.9 and 1.10?

 Ben Root


 On Mon, Jul 14, 2014 at 2:22 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:

 Hi All,

 Julian has raised the question of including numpy_ufunc in numpy 1.9. I
 don't feel strongly one way or the other, but it doesn't seem to be
 finished yet and 1.10 might be a better place to work out the remaining
 problems along with the astropy folks testing possible uses.

 Thoughts?

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] parallel distutils extensions build? use gcc -flto

2014-07-16 Thread Julian Taylor
hi,
I have been playing around a bit with gccs link time optimization
feature and found that using it actually speeds up a from scratch build
of numpy due to its ability to perform parallel optimization and linking.
As a bonus you also should get faster binaries due to the better
optimizations lto allows.

As compiling with lto does require some possibly lesser know details I
wanted to share it.

Prerequesits are a working gcc toolchain of at least gcc-4.8 and
binutils  2.21, gcc 4.9 is better as its faster.

First of all numpy checks the long double representation by compiling a
file and looking at the binary, this won't work as the od -b
reimplementation here does not understand lto objects, so on x86 we must
short circuit that:
--- a/numpy/core/setup_common.py
+++ b/numpy/core/setup_common.py
@@ -174,6 +174,7 @@ def check_long_double_representation(cmd):
 # We need to use _compile because we need the object filename
 src, object = cmd._compile(body, None, None, 'c')
 try:
+   return 'IEEE_DOUBLE_LE'
 type = long_double_representation(pyod(object))
 return type
 finally:


Next we build numpy as usual but override the compiler, linker and ar to
add our custom flags.
The setup.py call would look like this:

CC='gcc -fno-fat-lto-objects -flto=4 -fuse-linker-plugin -O3' \
LDSHARED='gcc -fno-fat-lto-objects -flto=4 -fuse-linker-plugin -shared
-O3' AR=gcc-ar \
python setup.py build_ext

Some explanation:
The ar override is needed as numpy builds a static library and ar needs
to know about lto objects. gcc-ar does exactly that.
-flto=4 the main flag tell gcc to perform link time optimizations using
4 parallel processes.
-fno-fat-lto-objects tells gcc to only build lto objects, normally it
builds both an lto object and a normal object for toolchain
compatibilty. If our toolchain can handle lto objects this is just a
waste of time and we skip it. (The flag is default in gcc-4.9 but not 4.8)
-fuse-linker-plugin directs it to run its link time optimizer plugin in
the linking step, the linker must support plugins, both bfd ( 2.21) and
gold linker do so. This allows for more optimizations.
-O3 has to be added to the linker too as thats where the optimization
occurs. In general a problem with lto is that the compiler options of
all steps much match the flags used for linking.

If you are using c++ or gfortran you also have to override that to use
lto (CXX and FF(?))

See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html for a lot
more details.


For some numbers on my machine a from scratch numpy build with no
caching takes 1min55s, with lto on 4 it only takes 55s. Pretty neat for
a much more involved optimization process.

Concerning the speed gain we get by this, I ran our benchmark suite with
this build, there were no really significant gains which is somewhat
expected as numpy is simple C code with most function bottlenecks
already inlined.

So conclusion: flto seems to work well with recent gccs and allows for
faster builds using the limited distutils. While probably not useful for
development where compiler caching (ccache) is of utmost importance it
is still interesting for projects doing one shot uncached builds (travis
like CI) and have huge objects (e.g. swig or cython) and don't want to
change to proper parallel build systems like bento.

PS: So far I know clang also supports lto but I never used it
PPS: using NPY_SEPARATE_COMPILATION=0 crashes gcc-4.9, time for a bug
report.

Cheers,
Julian
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String type again.

2014-07-16 Thread Chris Barker - NOAA Federal
 But HDF5
 additionally has a fixed-storage-width UTF8 type, so we could map to a
 NumPy fixed-storage-width type trivially.

Sure -- this is why *nix uses utf-8 for filenames -- it can just be a
char*. But that just punts the problem to client code.

I think a UTF-8 string type does not match the numpy model well, and I
don't think we should support it just because it would be easier for
the HDF 5 wrappers.

( to be fair, there are probably other similar systems numpy wants to
interface with that cod use this...)

It seems if you want a 1:1 binary mapping between HDF and numpy for
utf strings, then a bytes type in numpy makes more sense. Numpy
could/should have encode and decode methods for converting byte arrays
to/from Unicode arrays (does it already? ).

 Custom in this context means a user-created HDF5 data-conversion
 filter, which is necessary since all data conversion is handled inside
 the HDF5 library.

 As far as generic Unicode goes, we currently don't support the NumPy
 U dtype in h5py for similar reasons; there's no destination type in
 HDF5 which (1) would preserve the dtype for round-trip write/read
 operations and (2) doesn't risk truncation.

It sounds to like HDF5 simply doesn't support Unicode. Calling an
array of bytes utf-8 simple pushes the problem on to client libs. As
that's where the problem lies, then the PyHDF may be the place to
address it.

If we put utf-8 in numpy, we have the truncation problem there instead
-- which is exactly what I think we should avoid.

 A Latin-1 based 'a' type
 would have similar problems.

Maybe not -- latin1 is fixed width.

 Does HDF enforce ascii-only? what does it do with the  127 values?

 Unfortunately/fortunately the charset is not enforced for either ASCII

So you can dump Latin-1 into and out of the HDF 'ASCII' type -- it's
essentially the old char* / py2 string. An ugly situation, but why not
use it?

 or UTF-8,

So ASCII and utf-8 are really the same thing, with different meta-data...

 although the HDF Group has been thinking about it.

I wonder if they would consider going Latin-1 instead of ASCII --
similarly to utf-8 it's backward compatible with ASCII, but gives you
a little more.

I don't know that there is another 1byte encoding worth using -- it
maybe be my English bias, but it seems Latin-1 gives us ASCII+some
extra stuff handy for science ( I use the degree symbol a lot, for
instance) with nothing lost.

 Ideally, NumPy would support variable-length
 strings, in which case all these headaches would go away.

Would they? That would push the problem back to PyHDF -- which I'm
arguing is where it belongs, but I didn't think you were ;-)

 But I
 imagine that's also somewhat complicated. :)

That's a whole other kettle of fish, yes.


-Chris
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rounding float to integer while minizing the difference between the two arrays?

2014-07-16 Thread Chao YUE
Sorry, there is one error in this part of code, it should be:


def convert_integer(x,threshold=0):

This fucntion converts the float number x to integer according to the
threshold.

if abs(x-0)  1e-5:
return 0
else:
pdec,pint = math.modf(x)
if pdec  threshold:
return int(math.ceil(pint)+1)
else:
return int(math.ceil(pint))



On Wed, Jul 16, 2014 at 3:18 PM, Chao YUE chaoyue...@gmail.com wrote:

 Dear all,

 I have two arrays with both float type, let's say X and Y. I want to round
 the X to integers (intX) according to some decimal threshold, at the same
 time I want to limit the following difference as small:

 diff = np.sum(X*Y) - np.sum(intX*Y)

 I don't have to necessarily minimize the diff variable (If with this
 demand the computation time is too long). But I would like to limit the
 diff to, let's say ten percent within np.sum(X*Y).

 I have tried to write some functions, but I don't know where to start the
 opitimization.

 def convert_integer(x,threshold=0):
 
 This fucntion converts the float number x to integer according to the
 threshold.
 
 if abs(x-0)  1e5:
 return 0
 else:
 pdec,pint = math.modf(x)
 if pdec  threshold:
 return int(math.ceil(pint)+1)
 else:
 return int(math.ceil(pint))

 def convert_arr(arr,threshold=0):
 out = arr.copy()
 for i,num in enumerate(arr):
 out[i] = convert_integer(num,threshold=threshold)
 return out

 In [147]:
 convert_arr(np.array([0.14,1.14,0.12]),0.13)

 Out[147]:
 array([1, 2, 0])

 Now my problem is, how can I minimize or limit the following?
 diff = np.sum(X*Y) - np.sum(convert_arr(X,threshold=?)*Y)

 Because it's the first time I encounter such kind of question, so please
 give me some clue to start :p Thanks a lot in advance.

 Best,

 Chao

 --
 please visit:
 http://www.globalcarbonatlas.org/

 ***
 Chao YUE
 Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
 UMR 1572 CEA-CNRS-UVSQ
 Batiment 712 - Pe 119
 91191 GIF Sur YVETTE Cedex
 Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

 




-- 
please visit:
http://www.globalcarbonatlas.org/
***
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rounding float to integer while minizing the difference between the two arrays?

2014-07-16 Thread Chao YUE
Dear all,

A bit sorry, this is not difficult. scipy.optimize.minimize_scalar seems to
solve my problem. Thanks anyway, for this great tool.

Cheers,

Chao


On Wed, Jul 16, 2014 at 3:18 PM, Chao YUE chaoyue...@gmail.com wrote:

 Dear all,

 I have two arrays with both float type, let's say X and Y. I want to round
 the X to integers (intX) according to some decimal threshold, at the same
 time I want to limit the following difference as small:

 diff = np.sum(X*Y) - np.sum(intX*Y)

 I don't have to necessarily minimize the diff variable (If with this
 demand the computation time is too long). But I would like to limit the
 diff to, let's say ten percent within np.sum(X*Y).

 I have tried to write some functions, but I don't know where to start the
 opitimization.

 def convert_integer(x,threshold=0):
 
 This fucntion converts the float number x to integer according to the
 threshold.
 
 if abs(x-0)  1e5:
 return 0
 else:
 pdec,pint = math.modf(x)
 if pdec  threshold:
 return int(math.ceil(pint)+1)
 else:
 return int(math.ceil(pint))

 def convert_arr(arr,threshold=0):
 out = arr.copy()
 for i,num in enumerate(arr):
 out[i] = convert_integer(num,threshold=threshold)
 return out

 In [147]:
 convert_arr(np.array([0.14,1.14,0.12]),0.13)

 Out[147]:
 array([1, 2, 0])

 Now my problem is, how can I minimize or limit the following?
 diff = np.sum(X*Y) - np.sum(convert_arr(X,threshold=?)*Y)

 Because it's the first time I encounter such kind of question, so please
 give me some clue to start :p Thanks a lot in advance.

 Best,

 Chao

 --
 please visit:
 http://www.globalcarbonatlas.org/

 ***
 Chao YUE
 Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
 UMR 1572 CEA-CNRS-UVSQ
 Batiment 712 - Pe 119
 91191 GIF Sur YVETTE Cedex
 Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

 




-- 
please visit:
http://www.globalcarbonatlas.org/
***
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Rounding float to integer while minizing the difference between the two arrays?

2014-07-16 Thread Chao YUE
Dear all,

I have two arrays with both float type, let's say X and Y. I want to round
the X to integers (intX) according to some decimal threshold, at the same
time I want to limit the following difference as small:

diff = np.sum(X*Y) - np.sum(intX*Y)

I don't have to necessarily minimize the diff variable (If with this
demand the computation time is too long). But I would like to limit the
diff to, let's say ten percent within np.sum(X*Y).

I have tried to write some functions, but I don't know where to start the
opitimization.

def convert_integer(x,threshold=0):

This fucntion converts the float number x to integer according to the
threshold.

if abs(x-0)  1e5:
return 0
else:
pdec,pint = math.modf(x)
if pdec  threshold:
return int(math.ceil(pint)+1)
else:
return int(math.ceil(pint))

def convert_arr(arr,threshold=0):
out = arr.copy()
for i,num in enumerate(arr):
out[i] = convert_integer(num,threshold=threshold)
return out

In [147]:
convert_arr(np.array([0.14,1.14,0.12]),0.13)

Out[147]:
array([1, 2, 0])

Now my problem is, how can I minimize or limit the following?
diff = np.sum(X*Y) - np.sum(convert_arr(X,threshold=?)*Y)

Because it's the first time I encounter such kind of question, so please
give me some clue to start :p Thanks a lot in advance.

Best,

Chao

-- 
please visit:
http://www.globalcarbonatlas.org/
***
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion