[Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray

2015-02-10 Thread Kartik Kumar Perisetla
Hi all,

for one of my projects I am using basically using NLTK for pos tagging,
which internally uses a 'english.pickle' file. I managed to package the
nltk library with these pickle files to make them available to mapper and
reducer for hadoop streaming job using -file option.

However, when nltk library is trying to load that pickle file, it gives
error for numpy- since the cluster I am running this job does not have
numpy installed. Also, I don't have root access thus, can't install numpy
or any other package on cluster. So the only way is to package the python
modules to make it available for mapper and reducer. I successfully managed
to do that. But now the problem is when numpy is imported, it imports
multiarray by default( as seen in *init*.py) and this is where I am getting
the error:

File /usr/lib64/python2.6/pickle.py, line 1370, in load
return Unpickler(file).load()
  File /usr/lib64/python2.6/pickle.py, line 858, in load
dispatch[key](self)
  File /usr/lib64/python2.6/pickle.py, line 1090, in load_global
klass = self.find_class(module, name)
  File /usr/lib64/python2.6/pickle.py, line 1124, in find_class
__import__(module)
  File numpy.mod/numpy/__init__.py, line 170, in module
  File numpy.mod/numpy/add_newdocs.py, line 13, in module
  File numpy.mod/numpy/lib/__init__.py, line 8, in module
  File numpy.mod/numpy/lib/type_check.py, line 11, in module
  File numpy.mod/numpy/core/__init__.py, line 6, in module
ImportError: cannot import name multiarray

I tried moving numpy directory on my local machine that contains
multiarray.pyd, to the cluster to make it available to mapper and reducer
but this didn't help.

Any input on how to resolve this(keeping the constraint that I cannot
install anything on cluster machines)?

Thanks!

-- 
Regards,

Kartik Perisetla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Aligned / configurable memory allocation

2015-02-10 Thread Antoine Pitrou

Hello,

I apologize for pinging the list, but I was wondering if there was
interest in either of https://github.com/numpy/numpy/pull/5457 (make
array data aligned by default) or
https://github.com/numpy/numpy/pull/5470 (make the array data allocator
configurable)?

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Silent Broadcasting considered harmful

2015-02-10 Thread Sturla Molden
Chris Barker chris.bar...@noaa.gov wrote:

  The strongest use-case seems to be
 for teaching that involves linear algebra concepts, not real production
 code.

Not really. SymPy is a better teaching tool.

Some find A*B easier to read than dot(A,B). But with the @ operator in
Python 3.5 it does not have a usecase at all.


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Silent Broadcasting considered harmful

2015-02-10 Thread Todd
On Feb 10, 2015 1:03 AM, cjw c...@ncf.ca wrote:


 On 09-Feb-15 2:34 AM, Stefan Reiterer wrote:

 Ok that are indeed some good reasons to keep the status quo, especially
since
 performance is crucial for numpy.
 It's a dillemma: Using the matrix class for linear algebra would be the
correct
 way for such thing,
 but the matrix API is not that powerful and beautiful as the one of
arrays.
 On the other hand arrays are beautiful, but not exactly intended to use
for
 linear algebra.
 So maybe the better way would be not to add warnings to braodcasting
operations,
 but to overhaul the matrix class
 to make it more attractive for numerical linear algebra(?)

 +1
 I hope that this will be explored.  @ could still be used by those who
wish remain in the array world.


What about splitting it off into a scikit, or at least some sort of
separate package?  If there is sufficient interest in it, it can be
maintained there.  If not, at least people can use it as-is.  But there
would not be any expectation going forward that the rest of numpy has to
work well with it.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Aligned / configurable memory allocation

2015-02-10 Thread Nathaniel Smith
On 10 Feb 2015 09:11, Antoine Pitrou solip...@pitrou.net wrote:


 Hello,

 I apologize for pinging the list, but I was wondering if there was
 interest in either of https://github.com/numpy/numpy/pull/5457 (make
 array data aligned by default) or
 https://github.com/numpy/numpy/pull/5470 (make the array data allocator
 configurable)?

I'm not a fan of the configurable allocator. It adds new public APIs for us
to support, and makes switching to using Python's own memory allocation
APIs more complex. The feature is intrinsically dangerous, because newly
installed deallocators must be able to handle memory allocated by the
previous allocator. (AFAICT the included test case can crash the test
process if you get unlucky and GC runs during it?). And no one's
articulated any compelling argument for why we need this configurability.

Regarding the aligned allocation patch, I think the problem is just that
none of us have any way to evaluate it. I'd feel a lot more comfortable
with some solid numbers showing the costs and benefits on old and new
systems.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Aligned / configurable memory allocation

2015-02-10 Thread Nathaniel Smith
On 10 Feb 2015 13:10, Antoine Pitrou solip...@pitrou.net wrote:

 On Tue, 10 Feb 2015 11:26:22 -0800
 Nathaniel Smith n...@pobox.com wrote:
  On 10 Feb 2015 09:11, Antoine Pitrou solip...@pitrou.net wrote:
  
  
   Hello,
  
   I apologize for pinging the list, but I was wondering if there was
   interest in either of https://github.com/numpy/numpy/pull/5457 (make
   array data aligned by default) or
   https://github.com/numpy/numpy/pull/5470 (make the array data
allocator
   configurable)?
 
  I'm not a fan of the configurable allocator. It adds new public APIs
for us
  to support, and makes switching to using Python's own memory allocation
  APIs more complex. The feature is intrinsically dangerous, because newly
  installed deallocators must be able to handle memory allocated by the
  previous allocator. (AFAICT the included test case can crash the test
  process if you get unlucky and GC runs during it?).

 It's taken care of in the patch.

Ah, I see -- I missed that you added an allocator field to PyArrayObject.
That does reduce my objections to the patch. But I'm still not sure what
problems this is solving exactly.

Also, if we do decide to add a deallocation callback to PyArrayObject then
I think we should take advantage of the opportunity to also make life
easier for c API users who need a custom callback on a case-by-case basis
and currently have to jump through hoops using -base.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Aligned / configurable memory allocation

2015-02-10 Thread Julian Taylor
On 10.02.2015 22:33, Nathaniel Smith wrote:
 On 10 Feb 2015 13:10, Antoine Pitrou solip...@pitrou.net
 mailto:solip...@pitrou.net wrote:

 On Tue, 10 Feb 2015 11:26:22 -0800
 Nathaniel Smith n...@pobox.com mailto:n...@pobox.com wrote:
  On 10 Feb 2015 09:11, Antoine Pitrou solip...@pitrou.net
 mailto:solip...@pitrou.net wrote:
  
  
   Hello,
  
   I apologize for pinging the list, but I was wondering if there was
   interest in either of https://github.com/numpy/numpy/pull/5457 (make
   array data aligned by default) or
   https://github.com/numpy/numpy/pull/5470 (make the array data
 allocator
   configurable)?
 
  I'm not a fan of the configurable allocator. It adds new public APIs
 for us
  to support, and makes switching to using Python's own memory allocation
  APIs more complex. The feature is intrinsically dangerous, because newly
  installed deallocators must be able to handle memory allocated by the
  previous allocator. (AFAICT the included test case can crash the test
  process if you get unlucky and GC runs during it?).

 It's taken care of in the patch.

unfortunately it also breaks the ABI on two fronts, by adding a new
member to the public array struct which needs initializing by non api
using users and by removing the ability to use free on array pointers.
Both not particularly large breaks, but breaks nonetheless.

At least for the first issue we should (like for the proposed dtype and
ufunc changes) apply a more generic break of hiding the new internal
members in a new private structure that embeds the public structure
unchanged.
The second issue can probably be ignored, though we could retain it for
posix/c11 as those standards wisely decided to make aligned pointers
freeable with free.
That on the other hand costs us efficient calloc and realloc (standard
comities are weird sometimes ...)


 
 Ah, I see -- I missed that you added an allocator field to
 PyArrayObject. That does reduce my objections to the patch. But I'm
 still not sure what problems this is solving exactly.
 
 Also, if we do decide to add a deallocation callback to PyArrayObject
 then I think we should take advantage of the opportunity to also make
 life easier for c API users who need a custom callback on a case-by-case
 basis and currently have to jump through hoops using -base.
 



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Silent Broadcasting considered harmful

2015-02-10 Thread Ralf Gommers
On Tue, Feb 10, 2015 at 5:40 PM, Chris Barker chris.bar...@noaa.gov wrote:


 On Tue, Feb 10, 2015 at 12:28 AM, Todd toddr...@gmail.com wrote:

  So maybe the better way would be not to add warnings to braodcasting
 operations,
  but to overhaul the matrix class
  to make it more attractive for numerical linear algebra(?)



 What about splitting it off into a scikit, or at least some sort of
 separate package?  If there is sufficient interest in it, it can be
 maintained there.  If not, at least people can use it as-is.  But there
 would not be any expectation going forward that the rest of numpy has to
 work well with it


 Well, splitting it off is a good idea,


It's not, that would be a massive backwards compat break. Just leave as is,
and write this discussion up in a FAQ so we won't keep going in circles on
this topic.

Ralf



 seeing as how it hasn't gotten much love. But if the rest of numpy does
 not work well with it, then it becomes even less useful.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Silent Broadcasting considered harmful

2015-02-10 Thread Sturla Molden
Chris Barker chris.bar...@noaa.gov wrote:

 Well, splitting it off is a good idea, seeing as how it hasn't gotten much
 love. But if the rest of numpy does not work well with it, then it becomes
 even less useful.

PEP 3118 takes care of that.



Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray

2015-02-10 Thread Kartik Kumar Perisetla
Thanks David. But do I need to install virtualenv on every node in hadoop
cluster? Actually I am not very sure whether same namenodes are assigned
for my every hadoop job. So how shall I proceed on such scenario.

Thanks for your inputs.
Kartik
On Feb 11, 2015 1:56 AM, Daπid davidmen...@gmail.com wrote:

 On 11 February 2015 at 03:38, Kartik Kumar Perisetla
 kartik.p...@gmail.com wrote:
  Also, I don't have root access thus, can't install numpy or any other
  package on cluster

 You can create a virtualenv, and install packages on it without
 needing root access. To minimize trouble, you can ensure it uses the
 system packages when available. Here are instructions on how to
 install it:


 https://stackoverflow.com/questions/9348869/how-to-install-virtualenv-without-using-sudo

 http://opensourcehacker.com/2012/09/16/recommended-way-for-sudo-free-installation-of-python-software-with-virtualenv/

 This does not require root access, but it is probably good to check
 with the sysadmins to make sure they are fine with it.


 /David.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray

2015-02-10 Thread Daπid
On 11 February 2015 at 03:38, Kartik Kumar Perisetla
kartik.p...@gmail.com wrote:
 Also, I don't have root access thus, can't install numpy or any other
 package on cluster

You can create a virtualenv, and install packages on it without
needing root access. To minimize trouble, you can ensure it uses the
system packages when available. Here are instructions on how to
install it:

https://stackoverflow.com/questions/9348869/how-to-install-virtualenv-without-using-sudo
http://opensourcehacker.com/2012/09/16/recommended-way-for-sudo-free-installation-of-python-software-with-virtualenv/

This does not require root access, but it is probably good to check
with the sysadmins to make sure they are fine with it.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Matrix Class

2015-02-10 Thread cjw
It seems to be agreed that there are weaknesses in the existing Numpy Matrix
Class.

Some problems are illustrated below.

I'll try to put some suggestions over the coming weeks and would appreciate
comments.

Colin W.

Test Script:

if __name__ == '__main__':
a= mat([4, 5, 6])   # Good
print('a: ', a)
b= mat([4, '5', 6]) # Not the expected result
print('b: ', b)
c= mat([[4, 5, 6], [7, 8]]) # Wrongly accepted as rectangular
print('c: ', c)
d= mat([[1, 2, 3]])
try:
d[0, 1]= 'b'# Correctly flagged, not numeric
except ValueError:
print(d[0, 1]= 'b' # Correctly flagged, not numeric, '
ValueError')
print('d: ', d)

Result:

*** Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit
(AMD64)] on win32. ***
 
a:  [[4 5 6]]
b:  [['4' '5' '6']]
c:  [[[4, 5, 6] [7, 8]]]
d[0, 1]= 'b' # Correctly flagged, not numeric  ValueError
d:  [[1 2 3]]
 





--
View this message in context: 
http://numpy-discussion.10968.n7.nabble.com/Matrix-Class-tp39719.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion