2012/10/11 Andreas Mueller <[email protected]>:
> I don't really understand that. The problem seems to be
> that an Python int can not be converted to a C int.
> This seems pretty weird since we have the same situation
> (as far as I understand) in many other places without failures.

We've been very sloppy wrt. integer sizes, and so far we've been lucky.

It seems that _labels_inertia builds an array of dtype=np.int, the
size of which will I think be platform-dependent. On my box, it's

In [1]: np.array([], dtype=np.int).dtype
Out[1]: dtype('int64')

This array of integers is then fed to _centers_dense as an array of C
ints, the size of which is also platform-dependent, though usually
32-bit.

I think the solution would be to construct the labels array with an
explicitly specified integer width, then use that same size
everywhere. I.e. use np.int32 in Python code and np.int32_t in Cython.
Neither Python's int nor C's int provide enough guarantees for
cross-platform compatibility.

(I've recently been browsing the Pandas source code, and I can
recommend all Cythonistas working on scikit-learn to do the same. The
Pandas folks are quite precise about integer sizes <rant>unlike the
stupid Cython tutorial which says that Py_ssize_t is for "purists" and
happily advises people to write incorrect code</rant>.)

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to